Fugu-MT 論文翻訳(概要): JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

論文の概要: JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

arxiv url: http://arxiv.org/abs/2604.20100v1
Date: Wed, 22 Apr 2026 01:51:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-23 15:36:10.910948
Title: JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy
Title（参考訳）: JoyAI-RA 0.1:ロボットオートノミーの基礎モデル
Authors: Tianle Zhang, Zhihao Yuan, Dafeng Chi, Peidong Liu, Dongwei Li, Kejun Hu, Likui Zhang, Junnan Nie, Ziming Wei, Zengjue Chen, Yili Tang, Jiayi Li, Zhiyuan Xiang, Mingyang Li, Tianci Luo, Hanwen Wan, Ao Li, Linbo Zhai, Zhihao Zhan, Yuzheng Zhuang, Liang Lin, Xiaodong Bai, Jiakun Cai, Peng Cao, Kangliang Chen, Siang Chen, Yixiang Dai, Shuai Di, Nan Duan, Yicheng Gong, Chenguang Gui, Yucheng Guo, Peng Hao, Qingrong He, Haoyang Huang, Kunrui Huang, Zhixuan Huang, Shibo Jin, Yixiang Jin, Anson Li, Dongjiang Li, Jiawei Li, Ruodai Li, Yihang Li, Yuzhen Li, Jiaming Liang, Fangsheng Liu, Jing Long, Mingxi Luo, Xing Pan, Hui Shen, Xiaomeng Tian, Daming Wang, Song Wang, Junwu Xiong, Hang Xu, Wanting Xu, Zhengcheng Yu, He Zhang, Jiyao Zhang, Lin Zhao, Chen Zhou,
Abstract要約: JoyAI-RAは、汎用的なロボット操作に適した、視覚言語対応の基盤モデルである。 JoyAI-RAは、特に人間の操作とロボット制御の間において、具体化のギャップを埋める。シミュレーションと実世界のベンチマークの両方で最先端の手法よりも優れています。
参考スコア（独自算出の注目度）: 90.77129709149574
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robotic autonomy in open-world environments is fundamentally limited by insufficient data diversity and poor cross-embodiment generalization. Existing robotic datasets are often limited in scale and task coverage, while relatively large differences across robot embodiments impede effective behavior knowledge transfer. To address these challenges, we propose JoyAI-RA, a vision-language-action (VLA) embodied foundation model tailored for generalizable robotic manipulation. JoyAI-RA presents a multi-source multi-level pretraining framework that integrates web data, large-scale egocentric human manipulation videos, simulation-generated trajectories, and real-robot data. Through training on heterogeneous multi-source data with explicit action-space unification, JoyAI-RA effectively bridges embodiment gaps, particularly between human manipulation and robotic control, thereby enhancing cross-embodiment behavior learning. JoyAI-RA outperforms state-of-the-art methods in both simulation and real-world benchmarks, especially on diverse tasks with generalization demands.
Abstract（参考訳）: オープンワールド環境におけるロボットの自律性は、データ多様性の欠如とクロス・エボディメントの一般化の欠如によって根本的に制限されている。既存のロボットデータセットは、スケールやタスクのカバレッジに制限されることが多いが、ロボットの体格間での比較的大きな違いは、効果的な行動知識の伝達を妨げる。これらの課題に対処するために,汎用ロボット操作に適した視覚言語アクション(VLA)を具現化した基礎モデルであるJoyAI-RAを提案する。 JoyAI-RAは、Webデータ、大規模な人間操作ビデオ、シミュレーション生成トラジェクトリ、実ロボットデータを統合するマルチソースのマルチレベル事前トレーニングフレームワークを提供する。 JoyAI-RAは、異種多元データと明示的な行動空間の統合によるトレーニングを通じて、特に人間の操作とロボット制御の間において、エボデーメントギャップを効果的に橋渡しし、クロスボデーメント行動学習を強化する。 JoyAI-RAは、シミュレーションと実世界のベンチマークにおいて、特に一般化要求のある様々なタスクにおいて、最先端の手法よりも優れている。

論文の概要: JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy

関連論文リスト