Fugu-MT 論文翻訳(概要): Leave No Observation Behind: Real-time Correction for VLA Action Chunks

論文の概要: Leave No Observation Behind: Real-time Correction for VLA Action Chunks

arxiv url: http://arxiv.org/abs/2509.23224v1
Date: Sat, 27 Sep 2025 10:07:49 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.109309
Title: Leave No Observation Behind: Real-time Correction for VLA Action Chunks
Title（参考訳）: VLAアクションチャンクのリアルタイム補正
Authors: Kohei Sendai, Maxime Alvarez, Tatsuya Matsushima, Yutaka Matsuo, Yusuke Iwasawa,
Abstract要約: 非同期アクションチャンク補正(A2C2)は、制御ステップ毎に実行される軽量なリアルタイムチャンク補正ヘッドである。 A2C2は,高容量チャンキングポリシーをリアルタイム制御に展開するための効果的なプラグイン機構であることを示す。
参考スコア（独自算出の注目度）: 36.13271200613596
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To improve efficiency and temporal coherence, Vision-Language-Action (VLA) models often predict action chunks; however, this action chunking harms reactivity under inference delay and long horizons. We introduce Asynchronous Action Chunk Correction (A2C2), which is a lightweight real-time chunk correction head that runs every control step and adds a time-aware correction to any off-the-shelf VLA's action chunk. The module combines the latest observation, the predicted action from VLA (base action), a positional feature that encodes the index of the base action within the chunk, and some features from the base policy, then outputs a per-step correction. This preserves the base model's competence while restoring closed-loop responsiveness. The approach requires no retraining of the base policy and is orthogonal to asynchronous execution schemes such as Real Time Chunking (RTC). On the dynamic Kinetix task suite (12 tasks) and LIBERO Spatial, our method yields consistent success rate improvements across increasing delays and execution horizons (+23% point and +7% point respectively, compared to RTC), and also improves robustness for long horizons even with zero injected delay. Since the correction head is small and fast, there is minimal overhead compared to the inference of large VLA models. These results indicate that A2C2 is an effective, plug-in mechanism for deploying high-capacity chunking policies in real-time control.
Abstract（参考訳）: 効率性と時間的コヒーレンスを改善するために、ビジョン・ランゲージ・アクション(VLA)モデルはしばしばアクションチャンクを予測するが、このアクションチャンキングは推論遅延や長い水平線の下での反応性を損なう。我々はA2C2(Asynchronous Action Chunk Correction)を導入し、A2C2は軽量なリアルタイムチャンク補正ヘッドで、すべての制御ステップを実行し、市販のVLAのアクションチャンクに時間認識補正を加える。モジュールは最新の観察、VLA(ベースアクション)からの予測アクション、チャンク内のベースアクションのインデックスを符号化する位置特徴、ベースポリシーからのいくつかの特徴を組み合わせて、ステップ毎の補正を出力する。これは、クローズドループ応答性を回復しながら、ベースモデルの能力を維持する。このアプローチでは、ベースポリシーの再トレーニングは不要で、リアルタイムチャンキング(RTC)のような非同期実行スキームと直交する。 In the dynamic Kinetix task suite (12 task) and LIBERO spatial, we yields consistent success rate improvements across increase delays and execution horizons ( respectively respectively 23% point and +7% point, and improves robustness for long horizons with without zero injected delays。補正ヘッドは小さく高速であるため、大型VLAモデルと比較してオーバーヘッドは最小限である。これらの結果から,A2C2はリアルタイム制御に高容量チャンキングポリシーを展開するための効果的なプラグイン機構であることが示唆された。

関連論文リスト

Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy [52.106797722292896]
我々は,チャンクベースのアクション生成とリアルタイム修正を統合した動的クローズドループ拡散ポリシーフレームワークDCDPを提案する。動的PushTシミュレーションでは、DCDPは5%の計算しか必要とせず、再トレーニングなしに適応性を19%改善する。
論文参考訳（メタデータ） (2026-03-02T15:04:18Z)
Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation [95.89924101984566]
GPM(Global Prior Memory)とLCM(Local Consistency Memory)を備えたデュアルメモリVLAフレームワークOptimusVLAを紹介する。 GPMはガウスノイズを意味論的に類似した軌道から取得したタスクレベルの先行値に置き換える。 LCMは、時間的コヒーレンスと軌道の滑らかさを強制する学習された一貫性制約を注入する。
論文参考訳（メタデータ） (2026-02-22T15:39:34Z)
VLA-RAIL: A Real-Time Asynchronous Inference Linker for VLA Models and Robots [5.308743386891208]
VLA(Vision-Language-Action)モデルは、ロボット工学において画期的な進歩を遂げた。連続したアクションチャンクのキューを融合する戦略は、VLAモデル全体のパフォーマンスに大きな影響を与える。既存の方法は、ロボットアクションの実行時にジッタ、ストール、あるいは停止に悩まされる。本稿では,モデル推論とロボット動作制御を非同期に行うように設計された新しいフレームワークであるVLA-RAILを紹介する。
論文参考訳（メタデータ） (2025-12-31T06:59:42Z)
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach [78.4812458793128]
動作チャンクの高忠実度検証に軽量な擬数推定器を適用したテスト時間スケーリングフレームワークである textbfTACO を提案する。我々の手法は、オフライン強化学習(RL)における古典的な反探索原理に似ており、勾配のないため、計算上の大きな恩恵をもたらす。
論文参考訳（メタデータ） (2025-12-02T14:42:54Z)
CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation [67.1520483301709]
CronusVLAはシングルフレームのVLAモデルを効率的な後トレーニング段階を通じてマルチフレームのパラダイムに拡張する統合フレームワークである。 CronusVLAはSimplerEnvの最先端のパフォーマンスを70.9%の成功率で達成し、LIBEROのOpenVLAよりも12.7%改善した。
論文参考訳（メタデータ） (2025-06-24T17:30:27Z)
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration [69.54069477520534]
VLA(Vision-Language-Action)モデルは、その強力な制御能力に注目が集まっている。計算コストが高く、実行頻度も低いため、ロボット操作や自律ナビゲーションといったリアルタイムタスクには適さない。本稿では,共同スケジューリングモデルとプルーニングトークンにより,VLAモデルを高速化する統一フレームワークSP-VLAを提案する。
論文参考訳（メタデータ） (2025-06-15T05:04:17Z)
Real-Time Execution of Action Chunking Flow Policies [49.1574468325115]
本稿では,アクションインタラクションシステムの非同期実行を可能にする新しい推論時アルゴリズムを提案する。これは、再トレーニングなしでボックスから実行する拡散またはVLAベースのシステムに適用できる。その結果、RTCは高速で、性能が高く、推論操作に対して一意に堅牢であることがわかった。
論文参考訳（メタデータ） (2025-06-09T01:01:59Z)
Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding [24.1236728596359]
VLA(Vision-Language-Action)モデルでは、一般化可能なロボット操作の可能性を示している。本稿では,アクションチャンキングと統合されたVLAモデルのための最初の並列デコーディングフレームワークであるPD-VLAを提案する。本フレームワークは,並列な固定点反復によって解く非線形システムとして自己回帰復号を再構成する。
論文参考訳（メタデータ） (2025-03-04T06:12:08Z)
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling [51.38330727868982]
動作チャンキングが学習者と実証者の間の分岐にどのように影響するかを示す。動作チャンキングをクローズドループ適応でブリッジするテスト時間推論アルゴリズムである双方向デコーディング(BID)を提案する。提案手法は、7つのシミュレーションベンチマークと2つの実世界のタスクにまたがって、最先端の2つの生成ポリシーの性能を向上させる。
論文参考訳（メタデータ） (2024-08-30T15:39:34Z)
From Imitation to Refinement -- Residual RL for Precise Assembly [19.9786629249219]
近年のビヘイビア・クローン(BC)の進歩により、ロボットに新しいタスクを教えるのが容易になった。しかし、教育の容易さは信頼性の低いパフォーマンスを犠牲にしている。我々は,BCの教えやすさと長期的能力を維持しながら信頼性を克服する,シンプルで効果的な方法であるResiPを考案した。
論文参考訳（メタデータ） (2024-07-23T17:44:54Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。