Huggingface Repo

Warning

截止目前为止, 仍然未上传训练代码

a unified architecture that processes multimodal inputs indiscriminately

一个统一的架构, 在同一个模型中处理所有的模态.

大概的目的应该是为了在学习Embodied的Action模态的同时, 维持原有的模型的能力, 并增强在VLA方面对Vision的感知

a massive, high-quality multimodal embodied reasoning dataset, EO -Data1.5M

好诶! 有Reasoning的Embodiment数据集啦!

地址: IPEC-COMMUNITY/EO-Data1.5M, 尚未开源

后面详细给出数据集解释

trained through synergies between auto-regressive decoding and flow matching denoising on EO -Data1.5M

训练通过 auto-regressive的decoder + flow matching denoise 在 EO-Data1.5M 上进行训练