Fugu-MT 論文翻訳(概要): LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies

論文の概要: LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies

arxiv url: http://arxiv.org/abs/2602.21531v1
Date: Wed, 25 Feb 2026 03:33:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-26 18:19:16.687343
Title: LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies
Title（参考訳）: LiLo-VLA:Linked Object-Centric Policiesによる合成長軸マニピュレーション
Authors: Yue Yang, Shuo Cheng, Yu Fang, Homanga Bharadhwaj, Mingyu Ding, Gedas Bertasius, Daniel Szafir,
Abstract要約: LiLo-VLAは、新しいロングホライゾンタスクに対してゼロショットのモジュラリティをトレーニングすることなく実現できるモジュラーフレームワークである。 LIBERO-Long++とUltra-Longという2つの課題からなる21タスクのシミュレーションベンチマークを導入する。これらのシミュレーションでは、LiLo-VLAは平均成功率69%を達成し、Pi0.5を41%、OpenVLA-OFTを67%上回った。
参考スコア（独自算出の注目度）: 54.150202739999806
License: http://creativecommons.org/licenses/by/4.0/
Abstract: General-purpose robots must master long-horizon manipulation, defined as tasks involving multiple kinematic structure changes (e.g., attaching or detaching objects) in unstructured environments. While Vision-Language-Action (VLA) models offer the potential to master diverse atomic skills, they struggle with the combinatorial complexity of sequencing them and are prone to cascading failures due to environmental sensitivity. To address these challenges, we propose LiLo-VLA (Linked Local VLA), a modular framework capable of zero-shot generalization to novel long-horizon tasks without ever being trained on them. Our approach decouples transport from interaction: a Reaching Module handles global motion, while an Interaction Module employs an object-centric VLA to process isolated objects of interest, ensuring robustness against irrelevant visual features and invariance to spatial configurations. Crucially, this modularity facilitates robust failure recovery through dynamic replanning and skill reuse, effectively mitigating the cascading errors common in end-to-end approaches. We introduce a 21-task simulation benchmark consisting of two challenging suites: LIBERO-Long++ and Ultra-Long. In these simulations, LiLo-VLA achieves a 69% average success rate, outperforming Pi0.5 by 41% and OpenVLA-OFT by 67%. Furthermore, real-world evaluations across 8 long-horizon tasks demonstrate an average success rate of 85%. Project page: https://yy-gx.github.io/LiLo-VLA/.
Abstract（参考訳）: 汎用ロボットは、非構造環境における複数のキネマティック構造変化(例えば、アタッチメントやデタッチングオブジェクト)を含むタスクとして定義された、長距離操作をマスターする必要がある。 Vision-Language-Action(VLA)モデルは、多様な原子スキルを習得する可能性を提供するが、それらをシーケンシングする際の組み合わせの複雑さに苦慮し、環境に敏感なため、障害をカスケードする傾向がある。これらの課題に対処するため、我々はLiLo-VLA(Linked Local VLA)を提案する。インタラクションモジュールは、オブジェクト中心のVLAを使用して、関心の分離されたオブジェクトを処理し、無関係な視覚的特徴に対する堅牢性と空間的構成への不変性を保証する。このモジュール性は、動的リプランニングとスキル再利用を通じて堅牢な障害回復を促進し、エンドツーエンドアプローチで一般的なカスケードエラーを効果的に軽減する。 LIBERO-Long++とUltra-Longという2つの課題からなる21タスクのシミュレーションベンチマークを導入する。これらのシミュレーションでは、LiLo-VLAは平均成功率69%を達成し、Pi0.5を41%、OpenVLA-OFTを67%上回った。さらに、8つの長距離タスクに対する実世界の評価では、平均的な成功率は85%である。プロジェクトページ:https://yy-gx.github.io/LiLo-VLA/。

論文の概要: LiLo-VLA: Compositional Long-Horizon Manipulation via Linked Object-Centric Policies

関連論文リスト