Coding

—

ROCO Bench

LLM generate path points
using RRT

—

Humanoid Bench

env
- two agent mujoco model
- reward, observation space, action space
train
- tdmpc2
- ppo

Paper Reading

CoT

—

basic CoT

manually add prompt
make LLM thinking step by step

—

zero-shot CoT

add prompt: "Let's think step by step"
pros: simple, zero-shot
cons: bad performance

—

AutoCoT

use BERT, cluster
kmeans
example guided

—

ToT

Thought Decomposition
Thought Generator
- Sample
- Propose
State Evaluator
- Value
- Vote

—

GoT

::: block

find unreasonable choices
analyze why unreasonable with context
find the best choice ::: —

Chain of Draft

::: block Think step by step, but only keep a minimum draft for each thinking step, with 5 words at most. Return the answer at the end of the response after a separator . :::

Method

::: block

Zettelkasten Method
- $m_{i} = {c_{i}, t_{i}, K_{i}, G_{i}, X_{i}, e_{i}, L_{i}}$
Link Generation
- $s_{i, j} = \frac{e _{i} \cdot e _{j}}{∣ e _{i} ∣∣ e _{j} ∣}$
- $M_{near}^{n} = {m_{j} ∣ rank (s_{i, j}) \leq k, m_{j} \in M}$
Memory Evolution
- $m_{j}^{*} \leftarrow LLM (m_{n} ∥ M_{near}^{n} \ m_{j} ∥ m_{j} ∥ P_{s_{3}})$
Retrieve Relative Memory
- $e_{q} = f_{enc} [q]$
- $M_{re t r i e v e d} = {m_{j} ∣ rank (s_{i, j}) \leq k, m_{i} \in M}$

:::

Preserving and combining knowledge in robotic lifelong reinforcement learning

—

DPMM

$θ_{k} ∣ λ \sim H (λ)$
$π ∣ α \sim GEM (α)$
$v_{i} ∣ π \sim Cat (π)$
$x_{i} ∣ v_{i} \sim F (θ_{v_{i}})$

—

Variational Inference

$ELBO (q) = \sum_{k = 1}^{K} [E_{q} [θ_{k}]^{⊤} s_{k} (x) - \hat{N}_{k} [a (θ_{k})] + \hat{N}_{k} [lo g π_{k} (β)]$ $- \sum_{n = 1}^{N} \overset{r}{^}_{nk} + E_{q} [lo g \frac{B ( β _{k} ∣1 , α )}{q ( β _{k} ∣ α ^ _{k_{1}} , α ^ _{k_{0}} )}] + E_{q} [lo g \frac{H ( θ _{k} ∣ λ )}{q ( θ _{k} ∣ λ ^ _{k} )}]]$

—

Metrics

average success rate
- $P (t) = \frac{1}{N} \sum_{i = 1}^{N} P_{i} (t) : 每个任务的成功率的平均$
forgetting
- $F_{i} = (P_{i} (Δ i) - P_{i} (T)) : 训练后的成功率减去最终成功率$
- $F = \frac{1}{N - 1} \sum_{i = 1}^{N} F_{i}$
forward transfer
- $F T_{i} = \frac{1}{i - 1} \sum_{k = 1}^{i - 1} P_{k} (Δ k) : 训练前 i 个任务的平均成功率$
- $FT = \frac{1}{N - 1} \sum_{i = 2}^{N} F T_{i}$
Improvement of few-shot knowledge recall
- $f = \frac{1}{T \times P _{max}} (\int_{t_{j}}^{T_{j}} P (t) d t - \int_{t_{i}}^{T_{i}} P (t) d t)$

Knowledge Base

Explorer

2025-04-09