Fugu-MT 論文翻訳(概要): DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching

論文の概要: DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching

arxiv url: http://arxiv.org/abs/2603.26320v1
Date: Fri, 27 Mar 2026 11:38:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-30 21:49:48.472033
Title: DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching
Title（参考訳）: DFM-VLA:離散フローマッチングによるロボットマニピュレーションのための反復的動作補正
Authors: Jiayi Chen, Wenxuan Song, Shuai Chen, Jingbo Wang, Zhijun Li, Haoang Li,
Abstract要約: 本稿では,アクショントークンの反復精製のための離散フローマッチングVLAであるDFM-VLAを提案する。 DFM-VLAは、操作性能において、強い自己回帰、離散拡散、連続拡散ベースラインよりも一貫して優れる。
参考スコア（独自算出の注目度）: 20.252867273996085
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision--Language--Action (VLA) models that encode actions using a discrete tokenization scheme are increasingly adopted for robotic manipulation, but existing decoding paradigms remain fundamentally limited. Whether actions are decoded sequentially by autoregressive VLAs or in parallel by discrete diffusion VLAs, once a token is generated, it is typically fixed and cannot be revised in subsequent iterations, so early token errors cannot be effectively corrected later. We propose DFM-VLA, a discrete flow matching VLA for iterative refinement of action tokens. DFM-VLA~models a token-level probability velocity field that dynamically updates the full action sequence across refinement iterations. We investigate two ways to construct the velocity field: an auxiliary velocity-head formulation and an action-embedding-guided formulation. Our framework further adopts a two-stage decoding strategy with an iterative refinement stage followed by deterministic validation for stable convergence. Extensive experiments on CALVIN, LIBERO, and real-world manipulation tasks show that DFM-VLA consistently outperforms strong autoregressive, discrete diffusion, and continuous diffusion baselines in manipulation performance while retaining high inference efficiency. In particular, DFM-VLA achieves an average success length of 4.44 on CALVIN and an average success rate of 95.7\% on LIBERO, highlighting the value of action refinement via discrete flow matching for robotic manipulation. Our project is available \url{https://chris1220313648.github.io/DFM-VLA/}
Abstract（参考訳）: 離散トークン化方式を用いて動作を符号化するVLA(Vision-Language-Action)モデルは、ロボット操作にますます採用されているが、既存の復号パラダイムは基本的に制限されている。アクションが自己回帰VLAによって逐次復号されるか、あるいは離散拡散VLAによって並列に復号されるかにかかわらず、トークンが生成されると、通常は修正され、その後のイテレーションでは修正できないため、早期トークンエラーは後から効果的に修正できない。本稿では,アクショントークンの反復精製のための離散フローマッチングVLAであるDFM-VLAを提案する。 DFM-VLA~はトークンレベルの確率速度場をモデル化する。本稿では,速度場を構築するための2つの方法について検討する。さらに,2段階の復号化戦略を反復的改良段階に適用し,安定収束に対する決定論的検証を行った。 CALVIN, LIBERO, および実世界の操作タスクに関する大規模な実験により、DFM-VLAは高い推論効率を維持しながら、操作性能において強い自己回帰、離散拡散、連続拡散ベースラインを一貫して上回ることを示した。特に、DFM-VLAは、CALVINの平均成功期間が4.44であり、LIBEROの平均成功率は95.7\%であり、ロボット操作のための離散フローマッチングによるアクション改善の価値を強調している。私たちのプロジェクトは、https://chris1220313648.github.io/DFM-VLA/}で利用可能です。

論文の概要: DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching

関連論文リスト