Fugu-MT 論文翻訳(概要): AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots

論文の概要: AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots

arxiv url: http://arxiv.org/abs/2603.07648v1
Date: Sun, 08 Mar 2026 14:18:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:15.000358
Title: AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots
Title（参考訳）: AtomicVLA:ロボットにおける原子スキル学習の可能性を解き放つ
Authors: Likui Zhang, Tao Tang, Zhihao Zhan, Xiuwei Chen, Zisheng Chen, Jianhua Han, Jiangtong Zhu, Pei Xu, Hang Xu, Hefeng Wu, Liang Lin, Xiaodan Liang,
Abstract要約: 現実世界のロボットタスクは、長い水平、多段階の問題解決を伴い、継続的なスキル獲得のために一般化を必要とすることが多い。我々は、タスクレベルの計画、アトミックスキルの抽象化、きめ細かいアクションを共同で生成する統合計画実行フレームワークAtomicVLAを提案する。シミュレーションでは、AtomicVLAはLIBEROで$_0$2.4%、LIBERO-LONGで10%、CALVINで$_0$と$_0.5$0.22、平均タスク長0.25よりパフォーマンスが良い。
参考スコア（独自算出の注目度）: 92.18094199070693
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in Visual-Language-Action (VLA) models have shown promising potential for robotic manipulation tasks. However, real-world robotic tasks often involve long-horizon, multi-step problem-solving and require generalization for continual skill acquisition, extending beyond single actions or skills. These challenges present significant barriers for existing VLA models, which use monolithic action decoders trained on aggregated data, resulting in poor scalability. To address these challenges, we propose AtomicVLA, a unified planning-and-execution framework that jointly generates task-level plans, atomic skill abstractions, and fine-grained actions. AtomicVLA constructs a scalable atomic skill library through a Skill-Guided Mixture-of-Experts (SG-MoE), where each expert specializes in mastering generic yet precise atomic skills. Furthermore, we introduce a flexible routing encoder that automatically assigns dedicated atomic experts to new skills, enabling continual learning. We validate our approach through extensive experiments. In simulation, AtomicVLA outperforms $π_{0}$ by 2.4\% on LIBERO, 10\% on LIBERO-LONG, and outperforms $π_{0}$ and $π_{0.5}$ by 0.22 and 0.25 in average task length on CALVIN. Additionally, our AtomicVLA consistently surpasses baselines by 18.3\% and 21\% in real-world long-horizon tasks and continual learning. These results highlight the effectiveness of atomic skill abstraction and dynamic expert composition for long-horizon and lifelong robotic tasks. The project page is \href{https://zhanglk9.github.io/atomicvla-web/}{here}.
Abstract（参考訳）: VLA(Visual-Language-Action)モデルの最近の進歩は、ロボット操作タスクの有望な可能性を示している。しかし、現実のロボットタスクは、長い水平で多段階の問題解決を伴い、単一のアクションやスキルを超えて、継続的なスキル獲得の一般化を必要とすることが多い。これらの課題は、集約されたデータに基づいてトレーニングされたモノリシックなアクションデコーダを使用する既存のVLAモデルに重大な障壁をもたらし、スケーラビリティが低下する。これらの課題に対処するために、タスクレベルの計画、アトミックスキルの抽象化、きめ細かいアクションを共同で生成する統合計画実行フレームワークAtomicVLAを提案する。 AtomicVLAはSG-MoE(Skill-Guided Mixture-of-Experts)を通じてスケーラブルなアトミックスキルライブラリを構築する。さらに、専用原子エキスパートを新しいスキルに自動的に割り当てるフレキシブルなルーティングエンコーダを導入し、継続的な学習を可能にした。我々は広範な実験を通じてアプローチを検証する。シミュレーションにおいて、AtomicVLAはLIBEROで$π_{0}$ 2.4\%、LIBERO-LONGで10\%、CALVINで$π_{0}$ と $π_{0.5}$ 0.22 と 0.25 を上回る。さらに、私たちのAtomicVLAは、現実世界の長距離タスクや継続的な学習において、ベースラインを18.3\%、21\%に一貫して超えています。これらの結果は,長期的・生涯にわたるロボット作業における原子スキル抽象化と動的専門家構成の有効性を浮き彫りにした。プロジェクトページは \href{https://zhanglk9.github.io/atomicvla-web/}{here} である。

論文の概要: AtomicVLA: Unlocking the Potential of Atomic Skill Learning in Robots

関連論文リスト