Fugu-MT 論文翻訳(概要): Uncertainty Quantification for Flow-Based Vision-Language-Action Models

論文の概要: Uncertainty Quantification for Flow-Based Vision-Language-Action Models

arxiv url: http://arxiv.org/abs/2606.18043v1
Date: Tue, 16 Jun 2026 15:19:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-17 17:15:32.508679
Title: Uncertainty Quantification for Flow-Based Vision-Language-Action Models
Title（参考訳）: フローベースビジョン・ランゲージ・アクションモデルの不確かさの定量化
Authors: Ralf Römer, Maximilian Seeliger, Saida Liu, Ben Sturgis, Marco Bagatella, Daniel Marta, Andreas Krause, Angela P. Schoellig,
Abstract要約: 視覚言語アクションモデル(VLA)は、視覚言語バックボーンと、大規模ロボットデータセットのフローマッチングによってトレーニングされた表現力豊かな生成アクションヘッドを組み合わせる。ロボット操作における強い経験的性能にもかかわらず、VLAは予測の信頼性を定量化し、動作が信頼できない可能性があることを検知するメカニズムを欠いている。本稿では,不確実性誘導型アクティブ微調整のためのフレームワークであるSAVEを提案する。
参考スコア（独自算出の注目度）: 33.28454469934064
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-language-action models (VLAs) combine vision-language backbones with expressive generative action heads trained via flow matching on large-scale robotic datasets. Despite their strong empirical performance in robotic manipulation, VLAs lack mechanisms to quantify confidence in their predictions and to detect when their actions may be unreliable. This presents a critical limitation for real-world deployment in non-stationary environments, where models inevitably encounter scenarios outside their pretraining distribution and may fail without warning. To address this, we derive an efficient method for quantifying epistemic uncertainty in flow-matching models by leveraging velocity-field disagreement (VFD) across a small ensemble. We successfully use this uncertainty estimate for failure detection during deployment and active fine-tuning of flow-based VLAs. To this end, we propose SAVE, a framework for uncertainty-guided active multitask fine-tuning that reduces the number of costly expert demonstrations required to adapt VLAs to new tasks. Through extensive experiments on the LIBERO benchmark, we demonstrate that VFD yields better-calibrated uncertainty estimates predictive of downstream performance, that VFD achieves strong performance in detecting failures, and that uncertainty-guided data acquisition with SAVE requires at least 22% fewer samples than baselines. In summary, our work shows that quantifying epistemic uncertainty in flow-based VLAs improves both failure awareness and adaptation. Project website: tum-lsy.github.io/uq_vla/.
Abstract（参考訳）: 視覚言語アクションモデル(VLA)は、視覚言語バックボーンと、大規模ロボットデータセットのフローマッチングによってトレーニングされた表現力豊かな生成アクションヘッドを組み合わせる。ロボット操作における強い経験的性能にもかかわらず、VLAは予測の信頼性を定量化し、動作が信頼できない可能性があることを検知するメカニズムを欠いている。これは、非定常環境における現実のデプロイメントに重要な制限を与え、モデルが事前トレーニングされたディストリビューションの外でシナリオに遭遇し、警告なしで失敗する可能性がある。そこで本研究では,小アンサンブル間の速度場不一致(VFD)を利用して,フローマッチングモデルにおける疫学的不確実性を定量化する手法を提案する。我々は、この不確実性推定を、フローベースVLAの展開中の故障検出とアクティブ微調整にうまく利用した。この目的のために,不確実性誘導型アクティブマルチタスク微調整のためのフレームワークであるSAVEを提案する。 LIBERO ベンチマークの広範な実験により,VFD は下流性能の予測精度を向上し,VFD は故障検出において高い性能を達成し,SAVE による不確実性誘導データ取得はベースラインよりも少なくとも22%少ないサンプルを必要とすることを示した。まとめると、フローベースVLAにおける疫学的不確実性の定量化は、障害認識と適応の両方を改善する。プロジェクトウェブサイト:tum-lsy.github.io/uq_vla/

論文の概要: Uncertainty Quantification for Flow-Based Vision-Language-Action Models

関連論文リスト