Fugu-MT 論文翻訳(概要): AnchorVLA: Anchored Diffusion for Efficient End-to-End Mobile Manipulation

論文の概要: AnchorVLA: Anchored Diffusion for Efficient End-to-End Mobile Manipulation

arxiv url: http://arxiv.org/abs/2604.01567v1
Date: Thu, 02 Apr 2026 03:29:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.207306
Title: AnchorVLA: Anchored Diffusion for Efficient End-to-End Mobile Manipulation
Title（参考訳）: AnchorVLA: 効率的なエンドツーエンド移動操作のためのアンコール拡散
Authors: Jia Syuen Lim, Zhizhen Zhang, Peter Bohm, Brendan Tidd, Zi Huang, Yadan Luo,
Abstract要約: モバイル操作における中心的な課題は、実行中に反応性を維持しながら、可塑性アクションモデルを保存することである。 AnchorVLAは移動操作のための拡散型VLAポリシーであり、サンプリングが可算解多様体の近傍で始まると、広範囲な denoising は不要である、というコア洞察に基づいて構築されている。
参考スコア（独自算出の注目度）: 36.801575461006664
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: A central challenge in mobile manipulation is preserving multiple plausible action models while remaining reactive during execution. A bottle in a cluttered scene can often be approached and grasped in multiple valid ways. Robust behavior depends on preserving this action diversity while remaining reactive as the scene evolves. Diffusion policies are appealing because they model multimodal action distributions rather than collapsing to one solution. But in practice, full iterative denoising is costly at control time. Action chunking helps amortize inference, yet it also creates partially open-loop behavior, allowing small mismatches to accumulate into drift. We present AnchorVLA, a diffusion-based VLA policy for mobile manipulation built on the core insight that when sampling begins near a plausible solution manifold, extensive denoising is unnecessary to recover multimodal, valid actions. AnchorVLA combines a lightweight VLA adaptation backbone with an anchored diffusion action head, which denoises locally around anchor trajectories using a truncated diffusion schedule. This retains multimodal action generation while reducing inference cost for closed-loop control. Crucially, to mitigate chunking-induced drift, we introduce a test-time self-correction mechanism via a lightweight residual correction module that makes high-frequency, per-step adjustments during rollout. Across diverse mobile manipulation tasks, AnchorVLA improves success and stability under disturbances and distribution shifts while maintaining low-latency inference. The source code is made available at https://github.com/jason-lim26/AnchorVLA.
Abstract（参考訳）: モバイル操作における中心的な課題は、実行中に反応性を維持しながら、複数の可塑性アクションモデルを保存することである。散らかったシーンのボトルは、複数の有効な方法で近づき、つかむことができる。ロバストな振る舞いは、シーンが進化するにつれて反応を保ちながら、この動作の多様性を維持することに依存する。拡散ポリシーは、1つの解に崩壊するのではなく、多モーダルな作用分布をモデル化するため、魅力的である。しかし実際には、完全な反復的デノベーションは制御時にコストがかかる。アクションチャンキングは推論を暗記するのに役立つが、部分的にオープンループの振る舞いを生じ、小さなミスマッチがドリフトに蓄積される。 AnchorVLAは移動操作のための拡散型VLAポリシーであり、サンプリングが可算解多様体の近傍で始まると、多モーダルで有効な動作を回復するために広範囲な認知は不要である、というコア洞察に基づいて構築されている。 AnchorVLAは軽量なVLA適応バックボーンとアンカー拡散動作ヘッドを結合し、トランカットされた拡散スケジュールを使用してアンカー軌道の周囲を局所的に認知する。これはクローズドループ制御の推論コストを低減しつつ、マルチモーダルなアクション生成を保持する。重要なことは,チャンキングによるドリフトを緩和するために,ロールアウト中に高頻度でステップ毎の調整を行う軽量残差補正モジュールを介して,テスト時の自己補正機構を導入することである。多様なモバイル操作タスク全体にわたって、AnchorVLAは低レイテンシ推論を維持しながら、障害や分散シフトによる成功と安定性を改善している。ソースコードはhttps://github.com/jason-lim26/AnchorVLAで公開されている。

論文の概要: AnchorVLA: Anchored Diffusion for Efficient End-to-End Mobile Manipulation

関連論文リスト