Fugu-MT 論文翻訳(概要): Action Draft and Verify: A Self-Verifying Framework for Vision-Language-Action Model

論文の概要: Action Draft and Verify: A Self-Verifying Framework for Vision-Language-Action Model

arxiv url: http://arxiv.org/abs/2603.18091v1
Date: Wed, 18 Mar 2026 09:16:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:05.75885
Title: Action Draft and Verify: A Self-Verifying Framework for Vision-Language-Action Model
Title（参考訳）: Action Draft and Verify: Vision-Language-Action Modelのための自己検証フレームワーク
Authors: Chen Zhao, Zhuoran Wang, Haoyang Li, Shifeng Bao, Guanlin Li, Youhe Feng, Yang Li, Jie Tang, Jing Zhang,
Abstract要約: VLA(Vision-Language-Action)モデルは、最近、具体化されたタスク間で強力なパフォーマンスを示した。本稿では,拡散行動専門家が複数の候補アクションチャンクをドラフトし,VLMが各候補を1つの前方パスに1つの難易度基準でスコア付けして1つを選択することを提案する。マッチしたバックボーン、トレーニングデータ、アクション・チャンク長では、ADVは拡散ベースのベースラインよりも実世界の+4.3ポイント、+19.7ポイントで成功率を向上させる。
参考スコア（独自算出の注目度）: 31.013109374489442
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-Language-Action (VLA) models have recently demonstrated strong performance across embodied tasks. Modern VLAs commonly employ diffusion action experts to efficiently generate high-precision continuous action chunks, while auto-regressive generation can be slower and less accurate at low-level control. Yet auto-regressive paradigms still provide complementary priors that can improve robustness and generalization in out-of-distribution environments. To leverage both paradigms, we propose Action-Draft-and-Verify (ADV): diffusion action expert drafts multiple candidate action chunks, and the VLM selects one by scoring all candidates in a single forward pass with a perplexity-style metric. Under matched backbones, training data, and action-chunk length, ADV improves success rate by +4.3 points in simulation and +19.7 points in real-world over diffusion-based baseline, with a single-pass VLM reranking overhead.
Abstract（参考訳）: VLA(Vision-Language-Action)モデルは、最近、具体化されたタスク間で強力なパフォーマンスを示した。現代のVLAは拡散作用の専門家を用いて高速連続的な作用チャンクを効率よく生成するのに対し、自己回帰生成は低レベルの制御では遅く、正確ではない。しかし、自己回帰的パラダイムは相補的な事前を提供し、アウト・オブ・ディストリビューション環境における堅牢性と一般化を改善することができる。両パラダイムを利用するために,拡散アクションエキスパートが複数の候補アクションチャンクをドラフトし,VLMは1つの前方通過ですべての候補をパープレキシティスタイルのメトリクスでスコアリングして1つを選択する。マッチしたバックボーン、トレーニングデータ、アクション・チャンク長の下で、ADVはシミュレーションにおける成功率を+4.3ポイント、拡散ベースのベースラインよりも+19.7ポイント改善し、シングルパスのVLMがオーバーヘッドを優先する。

論文の概要: Action Draft and Verify: A Self-Verifying Framework for Vision-Language-Action Model

関連論文リスト