Fugu-MT 論文翻訳(概要): BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models

論文の概要: BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models

arxiv url: http://arxiv.org/abs/2512.11769v1
Date: Fri, 12 Dec 2025 18:30:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:40.30966
Title: BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models
Title（参考訳）: BLURR:ビジョンランゲージ・アクションモデルのための低リソース推論
Authors: Xiaoyu Ma, Zhengqing Yuan, Zheyuan Zhang, Kaiwen Shi, Lichao Sun, Yanfang Ye,
Abstract要約: 視覚言語アクション(VLA)モデルは印象的なゼロショット操作を可能にするが、その推論スタックは応答性のあるWebデモには重すぎることが多い。モデルチェックポイントの再トレーニングや変更なしに既存のVLAコントローラにプラグインできる軽量推論ラッパーであるBLURRを提案する。
参考スコア（独自算出の注目度）: 34.57464032562792
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-language-action (VLA) models enable impressive zero shot manipulation, but their inference stacks are often too heavy for responsive web demos or high frequency robot control on commodity GPUs. We present BLURR, a lightweight inference wrapper that can be plugged into existing VLA controllers without retraining or changing model checkpoints. Instantiated on the pi-zero VLA controller, BLURR keeps the original observation interfaces and accelerates control by combining an instruction prefix key value cache, mixed precision execution, and a single step rollout schedule that reduces per step computation. In our SimplerEnv based evaluation, BLURR maintains task success rates comparable to the original controller while significantly lowering effective FLOPs and wall clock latency. We also build an interactive web demo that allows users to switch between controllers and toggle inference options in real time while watching manipulation episodes. This highlights BLURR as a practical approach for deploying modern VLA policies under tight compute budgets.
Abstract（参考訳）: 視覚言語アクション(VLA)モデルは、印象的なゼロショット操作を可能にするが、その推論スタックは、応答性のあるWebデモやコモディティGPUでの高周波ロボット制御には重すぎることが多い。モデルチェックポイントの再トレーニングや変更なしに既存のVLAコントローラにプラグインできる軽量推論ラッパーであるBLURRを提案する。 pi-zero VLAコントローラ上に実装されたBLURRは、命令プレフィックスキー値キャッシュ、混合精度実行、ステップ毎の計算を削減したシングルステップロールアウトスケジュールを組み合わせることで、元の監視インターフェースを保持し、制御を高速化する。 SimplerEnvに基づく評価では、BLURRは元のコントローラに匹敵するタスク成功率を維持しながら、効果的なFLOPとウォールクロックレイテンシを大幅に低下させています。また、インタラクティブなWebデモを作成し、ユーザーは操作エピソードを見ながら、コントローラを切り替えたり、推論オプションをリアルタイムで切り替えることができます。このことは、BLURRを、厳格な計算予算の下で近代的なVLAポリシーを展開するための実践的なアプローチとして強調している。

論文の概要: BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models

関連論文リスト