Fugu-MT 論文翻訳(概要): AttenA+: Rectifying Action Inequality in Robotic Foundation Models

論文の概要: AttenA+: Rectifying Action Inequality in Robotic Foundation Models

arxiv url: http://arxiv.org/abs/2605.13548v1
Date: Wed, 13 May 2026 13:55:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:28.088308
Title: AttenA+: Rectifying Action Inequality in Robotic Foundation Models
Title（参考訳）: AttenA+:ロボット基礎モデルにおける行動不平等の是正
Authors: Daojie Peng, Fulong Ma, Jiahang Cao, Qiang Zhang, Xupeng Xie, Jian Guo, Ping Luo, Andrew F. Luo, Boyu Zhou, Jun Ma,
Abstract要約: 本稿では,速度駆動型アクションアテンションを通じて,運動学的に重要なセグメントを優先するアーキテクチャに依存しないフレームワークであるAttenA+を紹介する。我々の研究は、本質的な行動列の構造的前提をマイニングすることが、標準的なスケーリング法則に非常に効率的で物理学的な補完をもたらすことを示唆している。
参考スコア（独自算出の注目度）: 38.61160855341111
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing robotic foundation models, while powerful, are predicated on an implicit assumption of temporal homogeneity: treating all actions as equally informative during optimization. This "flat" training paradigm, inherited from language modeling, remains indifferent to the underlying physical hierarchy of manipulation. In reality, robot trajectories are fundamentally heterogeneous, where low-velocity segments often dictate task success through precision-demanding interactions, while high-velocity motions serve as error-tolerant transitions. Such a misalignment between uniform loss weighting and physical criticality fundamentally limits the performance of current Vision-Language-Action (VLA) models and World-Action Models (WAM) in complex, long-horizon tasks. To rectify this, we introduce AttenA+, an architecture-agnostic framework that prioritizes kinematically critical segments via velocity-driven action attention. By reweighting the training objective based on the inverse velocity field, AttenA+ naturally aligns the model's learning capacity with the physical demands of manipulation. As a plug-and-play enhancement, AttenA+ can be integrated into existing backbones without structural modifications or additional parameters. Extensive experiments demonstrate that AttenA+ significantly elevates the ceilings of current state-of-the-art models. Specifically, it improves OpenVLA-OFT to 98.6% (+1.5%) on the Libero benchmark and pushes FastWAM to 92.4% (+0.6%) on RoboTwin 2.0. Real-world validation on a Franka manipulator further showcases its robustness and cross-task generalization. Our work suggests that mining the intrinsic structural priors of action sequences offers a highly efficient, physics-aware complement to standard scaling laws, paving a new path for general-purpose robotic control.
Abstract（参考訳）: 既存のロボット基礎モデルは強力ではあるが、時間的均質性の暗黙の仮定を前提としており、全ての行動は最適化時にも同様に有益である。この「フラット」な訓練パラダイムは言語モデリングから受け継がれており、操作の基盤となる物理的階層とは無関係である。実際には、ロボットの軌道は基本的に不均一であり、低速度のセグメントは精度の高い相互作用によってタスクの成功を予測し、高速度の動作はエラー耐性の遷移として機能する。このような一様損失重み付けと物理的臨界性のミスアライメントは、複雑な長距離タスクにおける現在のビジョン・ランゲージ・アクション(VLA)モデルとワールド・アクション・モデル(WAM)のパフォーマンスを根本的に制限する。これを修正するために,速度駆動型アクションアテンションを介して運動学的に重要なセグメントを優先順位付けするアーキテクチャに依存しないフレームワークであるAttenA+を紹介した。逆速度場に基づいてトレーニング目標を再重み付けすることで、AttenA+は自然にモデルの学習能力と操作の物理的要求とを一致させる。プラグインとプレイの強化として、AttenA+は構造変更や追加パラメータなしで既存のバックボーンに統合できる。大規模な実験により、AtenA+は現在の最先端モデルの天井を著しく上昇させることが示された。具体的には、OpenVLA-OFTをリベロベンチマークで98.6%(+1.5%)に改善し、RoboTwin 2.0でFastWAMを92.4%(+0.6%)にプッシュする。フランカマニピュレータ上の実世界の検証は、その堅牢性とクロスタスクの一般化をさらに示す。我々の研究は、アクションシーケンスの本質的な構造的前提をマイニングすることが、標準的なスケーリング法則を補完し、汎用的なロボット制御のための新しい道を開く、非常に効率的で物理学的な認識を提供することを示唆している。

論文の概要: AttenA+: Rectifying Action Inequality in Robotic Foundation Models

関連論文リスト