Fugu-MT 論文翻訳(概要): FAVLA: A Force-Adaptive Fast-Slow VLA model for Contact-Rich Robotic Manipulation

論文の概要: FAVLA: A Force-Adaptive Fast-Slow VLA model for Contact-Rich Robotic Manipulation

arxiv url: http://arxiv.org/abs/2602.23648v1
Date: Fri, 27 Feb 2026 03:33:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.755038
Title: FAVLA: A Force-Adaptive Fast-Slow VLA model for Contact-Rich Robotic Manipulation
Title（参考訳）: FAVLA:接触リッチロボットマニピュレーションのための力適応型高速スローVLAモデル
Authors: Yao Li, Peiyuan Tang, Wuyang Zhang, Chengyang Zhu, Yifan Duan, Weikai Shi, Xiaodong Zhang, Zijiang Yang, Jianmin Ji, Yanyong Zhang,
Abstract要約: フォース/トルクフィードバックは、コンタクトリッチな操作におけるビジョン・ランゲージ・アクション(VLA)モデルを大幅に改善することができる。既存のほとんどのアプローチは、全てのモダリティを単一の動作周波数で融合させる。高速接触認識制御から遅い知覚計画を切り離す力適応型高速スローVLAであるFAVLAを提案する。
参考スコア（独自算出の注目度）: 20.067295745725257
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Force/torque feedback can substantially improve Vision-Language-Action (VLA) models on contact-rich manipulation, but most existing approaches fuse all modalities at a single operating frequency. This design ignores the mismatched sampling rates of real robot sensors, forcing downsampling of the high-frequency contact cues needed for reactive correction. Combined with common VLM-action-expert (AE) pipelines that execute action chunks largely open loop between expensive VLM updates, unified-frequency fusion often yields delayed responses to impacts, stick-slip, and force spikes. We propose FAVLA, a force-adaptive fast-slow VLA that decouples slow perception planning from fast contact-aware control. FAVLA runs a slow VLM at a fixed low frequency to encode modalities to produce latent representations and to predict near-future force variation. A fast AE then executes at a variable high frequency, conditioning on the latest force sequence data to generate reactive actions. We further introduce a force adapter that injects high-frequency force features into multiple AE layers, and adaptively schedules the AE's execution frequency based on the VLM's predicted force variation. Extensive experiments on contact-rich tasks demonstrate that FAVLA significantly outperforms baselines, achieving superior reactivity and success rates, especially with a smaller contact force during manipulation.
Abstract（参考訳）: フォース/トルクフィードバックは、コンタクトリッチな操作におけるビジョン・ランゲージ・アクション(VLA)モデルを大幅に改善するが、既存のほとんどのアプローチは、全てのモダリティを単一の動作周波数で融合させる。この設計は、実際のロボットセンサーのミスマッチサンプリング率を無視し、反応補正に必要な高周波コンタクトキューのダウンサンプリングを強制する。アクションチャンクを実行する一般的な VLM-action-expert (AE) パイプラインと組み合わせることで、高価な VLM 更新の間に大きなループが開かれ、統合周波数融合はしばしば衝撃、スティックスリップ、力スパイクに対する遅延応答をもたらす。高速接触認識制御から遅い知覚計画を切り離す力適応型高速スローVLAであるFAVLAを提案する。 FAVLAは遅いVLMを一定の低周波で実行し、モダリティを符号化して遅延表現を生成し、近未来力の変動を予測する。高速AEは可変高周波で実行され、最新のフォースシーケンスデータに条件付けされ、リアクティブアクションを生成する。さらに、複数のAE層に高周波力特性を注入する力アダプタを導入し、VLMの予測力変化に基づいてAEの実行周波数を適応的にスケジュールする。コンタクトリッチタスクに関する広範囲な実験により、FAVLAはベースラインを著しく上回り、特に操作時の接触力の小さいより優れた反応性と成功率を達成することが示された。

論文の概要: FAVLA: A Force-Adaptive Fast-Slow VLA model for Contact-Rich Robotic Manipulation

関連論文リスト