Leave No Observation Behind: Real-time Correction for VLA Action Chunks
- URL: http://arxiv.org/abs/2509.23224v1
- Date: Sat, 27 Sep 2025 10:07:49 GMT
- Title: Leave No Observation Behind: Real-time Correction for VLA Action Chunks
- Authors: Kohei Sendai, Maxime Alvarez, Tatsuya Matsushima, Yutaka Matsuo, Yusuke Iwasawa,
- Abstract summary: Asynchronous Action Chunk Correction (A2C2) is a lightweight real-time chunk correction head that runs every control step.<n>We show that A2C2 is an effective plug-in mechanism for deploying high-capacity chunking policies in real-time control.
- Score: 36.13271200613596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To improve efficiency and temporal coherence, Vision-Language-Action (VLA) models often predict action chunks; however, this action chunking harms reactivity under inference delay and long horizons. We introduce Asynchronous Action Chunk Correction (A2C2), which is a lightweight real-time chunk correction head that runs every control step and adds a time-aware correction to any off-the-shelf VLA's action chunk. The module combines the latest observation, the predicted action from VLA (base action), a positional feature that encodes the index of the base action within the chunk, and some features from the base policy, then outputs a per-step correction. This preserves the base model's competence while restoring closed-loop responsiveness. The approach requires no retraining of the base policy and is orthogonal to asynchronous execution schemes such as Real Time Chunking (RTC). On the dynamic Kinetix task suite (12 tasks) and LIBERO Spatial, our method yields consistent success rate improvements across increasing delays and execution horizons (+23% point and +7% point respectively, compared to RTC), and also improves robustness for long horizons even with zero injected delay. Since the correction head is small and fast, there is minimal overhead compared to the inference of large VLA models. These results indicate that A2C2 is an effective, plug-in mechanism for deploying high-capacity chunking policies in real-time control.
Related papers
- Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy [52.106797722292896]
We present DCDP, a Dynamic Closed-Loop Diffusion Policy framework that integrates chunk-based action generation with real-time correction.<n>In dynamic PushT simulations, DCDP improves adaptability by 19% without retraining while requiring only 5% additional computation.
arXiv Detail & Related papers (2026-03-02T15:04:18Z) - Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation [95.89924101984566]
We introduce OptimusVLA, a dual-memory VLA framework with Global Prior Memory (GPM) and Local Consistency Memory (LCM)<n>GPM replaces Gaussian noise with task-level priors retrieved from semantically similar trajectories.<n>LCM injects a learned consistency constraint that enforces temporal coherence and smoothness of trajectory.
arXiv Detail & Related papers (2026-02-22T15:39:34Z) - VLA-RAIL: A Real-Time Asynchronous Inference Linker for VLA Models and Robots [5.308743386891208]
Vision-Language-Action (VLA) models have achieved remarkable breakthroughs in robotics.<n>The strategies for fusing a queue of successive action chunks have a profound impact on the overall performance of VLA models.<n>Existing methods suffer from jitter, stalling, or even pauses in robotic action execution.<n>This paper introduces VLA-RAIL, a novel framework designed to conduct model inference and robot motion control asynchronously.
arXiv Detail & Related papers (2025-12-31T06:59:42Z) - Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach [78.4812458793128]
We propose textbfTACO, a test-time-scaling framework that applies a lightweight pseudo-count estimator as a high-fidelity verifier of action chunks.<n>Our method resembles the classical anti-exploration principle in offline reinforcement learning (RL), and being gradient-free, it incurs significant computational benefits.
arXiv Detail & Related papers (2025-12-02T14:42:54Z) - CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation [67.1520483301709]
CronusVLA is a unified framework that extends single-frame VLA models to the multi-frame paradigm through an efficient post-training stage.<n>CronusVLA achieves state-of-the-art performance on SimplerEnv with 70.9% success rate, and 12.7% improvement over OpenVLA on LIBERO.
arXiv Detail & Related papers (2025-06-24T17:30:27Z) - SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration [69.54069477520534]
Vision-Language-Action (VLA) models have attracted increasing attention for their strong control capabilities.<n>Their high computational cost and low execution frequency hinder their suitability for real-time tasks such as robotic manipulation and autonomous navigation.<n>We propose SP-VLA, a unified framework that accelerates VLA models by jointly scheduling models and pruning tokens.
arXiv Detail & Related papers (2025-06-15T05:04:17Z) - Real-Time Execution of Action Chunking Flow Policies [49.1574468325115]
This paper presents a novel inference-time algorithm that enables asynchronous execution of action interacting systems.<n>It is applicable to any diffusion- or VLA-based systems executing out of the box with no re-training.<n>Results show that RTC is fast, performant, and uniquely robust to inference manipulation.
arXiv Detail & Related papers (2025-06-09T01:01:59Z) - Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding [24.1236728596359]
Vision-Language-Action (VLA) models demonstrate remarkable potential for generalizable robotic manipulation.<n>We propose PD-VLA, the first parallel decoding framework for VLA models integrated with action chunking.<n>Our framework reformulates autoregressive decoding as a nonlinear system solved by parallel fixed-point iterations.
arXiv Detail & Related papers (2025-03-04T06:12:08Z) - Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling [51.38330727868982]
We show how action chunking impacts the divergence between a learner and a demonstrator.<n>We propose Bidirectional Decoding (BID), a test-time inference algorithm that bridges action chunking with closed-loop adaptation.<n>Our method boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - From Imitation to Refinement -- Residual RL for Precise Assembly [19.9786629249219]
Recent advances in Behavior Cloning (BC) have made it easy to teach robots new tasks.<n>However, we find that the ease of teaching comes at the cost of unreliable performance.<n>We devise a simple yet effective method, ResiP, that overcomes the reliability problem while retaining BC's ease of teaching and long-horizon capabilities.
arXiv Detail & Related papers (2024-07-23T17:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.