paper
Warning
截止目前为止, 仍然未上传训练代码
a unified architecture that processes multimodal inputs indiscriminately
一个统一的架构, 在同一个模型中处理所有的模态.
大概的目的应该是为了在学习Embodied的Action模态的同时, 维持原有的模型的能力, 并增强在VLA方面对Vision的感知
a massive, high-quality multimodal embodied reasoning dataset, EO -Data1.5M
好诶! 有Reasoning的Embodiment数据集啦!
地址: IPEC-COMMUNITY/EO-Data1.5M, 尚未开源
后面详细给出数据集解释
trained through synergies between auto-regressive decoding and flow matching denoising on EO -Data1.5M
训练通过 auto-regressive的decoder + flow matching denoise 在 EO-Data1.5M 上进行训练