Fugu-MT 論文翻訳(概要): Enabling Dynamic Tracking in Vision-Language-Action Models via Time-Discrete and Time-Continuous Velocity Feedforward

論文の概要: Enabling Dynamic Tracking in Vision-Language-Action Models via Time-Discrete and Time-Continuous Velocity Feedforward

arxiv url: http://arxiv.org/abs/2603.16218v1
Date: Tue, 17 Mar 2026 07:50:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-21 18:33:56.892298
Title: Enabling Dynamic Tracking in Vision-Language-Action Models via Time-Discrete and Time-Continuous Velocity Feedforward
Title（参考訳）: 時間離散および時間連続速度フィードフォワードによる視覚・言語・行動モデルにおける動的トラッキングの実現
Authors: Johannes Hechtl, Philipp Schmitt, Georg von Wichert, Wolfram Burgard,
Abstract要約: 視覚言語アクション(VLA)モデルは、ロボット操作に非常に有望である。厳格な産業用ロボットへの展開は、コンプライアンスと応答性の本質的にのトレードオフのため、依然として困難である。本稿では、このトレードオフを解決するために、速度フィードフォワード項をVLAポリシーに統合することの重要性を示す。
参考スコア（独自算出の注目度）: 11.066720921275648
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While vision-language-action (VLA) models have shown great promise for robot manipulation, their deployment on rigid industrial robots remains challenging due to the inherent trade-off between compliance and responsiveness. Standard Behavior Cloning (BC) approaches predict discrete poses at low frequencies, omitting the velocity and acceleration feedforward terms typically used by low-level compliant controllers. This requires to rely on high stiffness for accurate tracking, thereby sacrificing safe contact dynamics. In this paper, we demonstrate the importance of integrating velocity feedforward terms into VLA policies to resolve this trade-off. We propose two methods for extracting velocity targets from VLAs: a time-discrete finite-difference approximation that serves as a highly effective bridge for existing models, and a continuous Cubic B-Spline action space that natively yields $C^2$ continuous trajectories for high-frequency control. Crucially, both approaches are strictly model-agnostic and compatible with any standard action-chunking architecture, requiring modifications only to teleoperation, data processing, and the low-level controller. We fine-tune the $π_{0.5}$ model and evaluate both of our approaches on a demanding, contact-rich cube-in-hole task. Our results indicate that incorporating the velocity feedforward term via finite differences significantly improves task execution speed, while the continuous B-Spline approach maintains high overall success rates and provides a foundation for smoother higher-order derivatives without compromising compliance.
Abstract（参考訳）: ビジョン・ランゲージ・アクション(VLA)モデルはロボット操作に非常に有望であるが、コンプライアンスと応答性の間に本質的にトレードオフがあるため、剛性のある産業用ロボットへの展開は依然として困難である。 BC(Standard Behavior Cloning)アプローチは、低周波数での離散的なポーズを予測し、低レベル対応コントローラで一般的に使用される速度と加速度フィードフォワード項を省略する。これは正確な追跡のために高い剛性に頼る必要があり、それによって安全な接触ダイナミクスを犠牲にする。本稿では,このトレードオフを解決するために,速度フィードフォワード項をVLAポリシーに統合することが重要であることを示す。本稿では,VLAから速度目標を抽出する2つの手法を提案する。既存のモデルに対する高効率ブリッジとして機能する時間離散有限差分近似と,高周波数制御のためにC^2$連続トラジェクトリをネイティブに生成する連続立方体B-Spline作用空間である。重要なことに、どちらのアプローチも厳密なモデルに依存しず、標準的なアクションチャンキングアーキテクチャと互換性があり、遠隔操作、データ処理、低レベルコントローラにのみ修正を必要とする。我々は、$π_{0.5}$モデルを微調整し、必要な接触リッチな立方体-ホールタスクにおいて、両方のアプローチを評価する。その結果,有限差分によるベロシティフィードフォワード項の導入はタスク実行速度を著しく向上させる一方で,連続的B-Spline手法は高い総合的な成功率を維持し,コンプライアンスを損なうことなく高次導関数のスムーズなスムーズ化のための基盤を提供することを示す。

論文の概要: Enabling Dynamic Tracking in Vision-Language-Action Models via Time-Discrete and Time-Continuous Velocity Feedforward

関連論文リスト