Fugu-MT 論文翻訳(概要): Flow Control: Steering Vision-Language-Action Models with Simple Real-Time Inputs

論文の概要: Flow Control: Steering Vision-Language-Action Models with Simple Real-Time Inputs

arxiv url: http://arxiv.org/abs/2606.10180v1
Date: Mon, 08 Jun 2026 21:16:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-10 15:40:58.195095
Title: Flow Control: Steering Vision-Language-Action Models with Simple Real-Time Inputs
Title（参考訳）: フロー制御:簡易実時間入力を用いたステアリング・ビジョン・ランゲージ・アクションモデル
Authors: Jonathan C. Kao, Jason Chan, Andy Wang,
Abstract要約: 本稿では,キーボードなどの汎用入力を通じて,VLAアクションをリアルタイムに操る簡便かつ効果的な方法として,視覚言語アクション(VLA)モデルのフロー制御を導入する。比較的粗いユーザ入力により、VLAをユーザの意図に合わせることができる。
参考スコア（独自算出の注目度）: 4.38616347977332
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We introduce flow control of vision-language-action (VLA) models, a simple and effective way to steer VLA actions in real-time through generic inputs, such as a keyboard. This method can be used out-of-the-box and does not require retraining or fine-tuning VLAs. It enables relatively crude user inputs to steer a VLA to align with user intent. The VLA transforms these inputs into action samples drawn from the VLA expert action distribution learned during training, so that the generated actions are high quality (conformity to the action expert distribution) and high fidelity (reflecting the user's intent). We demonstrate that flow control has many desirable properties: (1) flow control accurately and responsively steers robot actions with user inputs, (2) it is robust to suboptimal user inputs, (3) it enables users to steer VLAs to achieve significantly higher success rates and faster task completion, and (4) fine-tuning a VLA on flow control trajectories improves the autonomous policy. Together, these results provide a simple and intuitive way for users to help steer VLA actions, increasing task performance.
Abstract（参考訳）: 本稿では,キーボードなどの汎用入力を通じて,VLAアクションをリアルタイムに操る簡便かつ効果的な方法として,視覚言語アクション(VLA)モデルのフロー制御を導入する。この方法はアウト・オブ・ザ・ボックスで使用することができ、再トレーニングや微調整のVLAを必要としない。比較的粗いユーザ入力により、VLAをユーザの意図に合わせることができる。 VLAは、これらの入力をトレーニング中に学習したVLA専門家行動分布から引き出されたアクションサンプルに変換し、生成したアクションが高品質(アクション専門家分布に適合)で忠実(ユーザの意図を反映)であるようにする。フロー制御には,(1) フロー制御の精度, 応答性, (2) ユーザ入力によるロボット動作の制御, (2) 最適なユーザ入力に対する堅牢性,(3) ユーザがVLAをステアリングすることで,より高い成功率とタスク完了を達成できること,(4) フロー制御トラジェクトリにおけるVLAの微調整により自律的なポリシーが向上すること,など,多くの望ましい特性が示されている。これらの結果は、ユーザがVLAアクションを操り、タスクパフォーマンスを向上させるための、シンプルで直感的な方法を提供する。

論文の概要: Flow Control: Steering Vision-Language-Action Models with Simple Real-Time Inputs

関連論文リスト