Fugu-MT 論文翻訳(概要): FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination

論文の概要: FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination

arxiv url: http://arxiv.org/abs/2512.04381v1
Date: Thu, 04 Dec 2025 02:04:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:40.115531
Title: FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination
Title（参考訳）: FALCON: ファンデーションモデルに基づくコーディネーションによるロコマニピュレーションのための能動的分離型ビズモータ政策
Authors: Chengyang He, Ge Sun, Yue Bai, Junkai Lu, Jiadong Zhao, Guillaume Sartoretti,
Abstract要約: FoundAtion-model-guided decoupled LoCO-maNipulation visuomotor Policy (FALCON)について述べる。 FALCONはモジュラー拡散ポリシーと、コーディネータとしての視覚言語基盤モデルを組み合わせる。我々はFALCONをナビゲーション、精密なエンドエフェクタ配置、厳密なベースアーム調整を必要とする2つの困難なロコ操作タスクで評価した。
参考スコア（独自算出の注目度）: 14.277860121790075
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present FoundAtion-model-guided decoupled LoCO-maNipulation visuomotor policies (FALCON), a framework for loco-manipulation that combines modular diffusion policies with a vision-language foundation model as the coordinator. Our approach explicitly decouples locomotion and manipulation into two specialized visuomotor policies, allowing each subsystem to rely on its own observations. This mitigates the performance degradation that arise when a single policy is forced to fuse heterogeneous, potentially mismatched observations from locomotion and manipulation. Our key innovation lies in restoring coordination between these two independent policies through a vision-language foundation model, which encodes global observations and language instructions into a shared latent embedding conditioning both diffusion policies. On top of this backbone, we introduce a phase-progress head that uses textual descriptions of task stages to infer discrete phase and continuous progress estimates without manual phase labels. To further structure the latent space, we incorporate a coordination-aware contrastive loss that explicitly encodes cross-subsystem compatibility between arm and base actions. We evaluate FALCON on two challenging loco-manipulation tasks requiring navigation, precise end-effector placement, and tight base-arm coordination. Results show that it surpasses centralized and decentralized baselines while exhibiting improved robustness and generalization to out-of-distribution scenarios.
Abstract（参考訳）: 本稿では、モジュラー拡散ポリシーと視覚言語基礎モデルをコーディネータとして組み合わせたロコ操作のためのフレームワークである、FoundAtion-model-guidled LoCO-maNipulation visuomotor Policy (FALCON)を提案する。当社のアプローチでは,ロコムーブメントと操作を2つの特別なビジュモータポリシーに明確に分離し,それぞれのサブシステムが独自の観察に頼れるようにしている。このことは、1つのポリシーが不均一で、移動や操作による潜在的にミスマッチした観察を融合せざるを得ないときに生じるパフォーマンス劣化を緩和する。我々の重要な革新は、これらの2つの独立政策間の調整をビジョン言語基盤モデルを通じて復元することであり、これは、グローバルな観察と言語命令を、両方の拡散ポリシーを共用した埋め込み条件にエンコードする。このバックボーン上に、タスクステージのテキスト記述を用いて、手動の位相ラベルを使わずに、個別の位相と連続的な進行推定を推測するフェーズプログレスヘッドを導入する。遅延空間をさらに構造化するために、アームとベースアクション間のサブシステム間の互換性を明示的に符号化した協調対応のコントラスト損失を組み込む。我々はFALCONをナビゲーション、精密なエンドエフェクタ配置、厳密なベースアーム調整を必要とする2つの困難なロコ操作タスクで評価した。その結果,集中型・分散型ベースラインを超越し,ロバスト性の向上とアウト・オブ・ディストリビューションシナリオへの一般化を示した。

論文の概要: FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination

関連論文リスト