Hamster

Paper

Introduce

robot数据是expensive的.

小模型效果不错

结合大模型VLM的泛化优势和小模型的效率,局部robustness

Hamster

分成两个阶段:

在大规模, off-domain的数据集上finetune VLM, 生成2D path guidance
基于2D path生成action

VLM for Producing 2D Path Trained from Off-Domain Data

high-level的VLM根据单目RGB图片 $img$ 和文字instruction $z$ 预测粗略的2D path $\overset{p}{^} \sim VLM (img, z)$ , 这个路径描述机器人的end-effector(eef)在这个RGB image上的移动轨迹, 同时包含gripper的开合状态: $p = [(x_{t}, y_{t}, gripper_open_{t})]_{t}$ . 其中坐标均为归一化之后的值, $gripper_open$ 是二进制的值, 表示gripper的open或close

使用VILA作为backbone.

Fine-tuning Objective and Datasets

多样化off-domain dataset, 包含real-world data, visual question-answering data, simulation data.

Pixel Point Prediction:

使用RoboPoint数据集, 输入: image和instruction, 输出: 一个array, 包含point

Simulated Robot Data:

使用RLBench生成一个dataset. RL Bench是使用Franka进行tabletop manipulation的simulator

输入: camera的第一帧作为image, 以及task的instruction, 输出: 路径 $p = [(x_{t}, y_{t}, gripper_open_{t})]_{t}$ , 这个路径是真实路径通过FK和相机参数投影计算得出的

Real Robot Data:

Knowledge Base

Explorer

Hamster

Hamster

Introduce

Hamster

VLM for Producing 2D Path Trained from Off-Domain Data

Fine-tuning Objective and Datasets

Path Guided Low-Level Policy Learning

Graph View

Table of Contents

Backlinks