Related papers: RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models

RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models

URL: http://arxiv.org/abs/2511.01331v1
Date: Mon, 03 Nov 2025 08:30:48 GMT
Title: RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models
Authors: Hongyin Zhang, Shuo Zhang, Junxi Jin, Qixin Zeng, Runze Li, Donglin Wang,
Abstract summary: Vision-Language-Action (VLA) models fail to generalize reliably in out-of-distribution deployments.<n>We introduce RobustVLA, a lightweight online RL post-training method designed to explicitly enhance the resilience of VLA models.<n>Our results highlight the importance of robustness-aware RL post-training as a key step toward improving the principled reliability and robustness of VLA models.
Score: 33.503927352666096
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-Language-Action (VLA) models have recently emerged as powerful general-purpose policies for robotic manipulation, benefiting from large-scale multi-modal pre-training. However, they often fail to generalize reliably in out-of-distribution deployments, where unavoidable disturbances such as observation noise, sensor errors, or actuation perturbations become prevalent. While recent Reinforcement Learning (RL)-based post-training provides a practical means to adapt pre-trained VLA models, existing methods mainly emphasize reward maximization and overlook robustness to environmental uncertainty. In this work, we introduce RobustVLA, a lightweight online RL post-training method designed to explicitly enhance the resilience of VLA models. Through a systematic robustness analysis, we identify two key regularizations: Jacobian regularization, which mitigates sensitivity to observation noise, and smoothness regularization, which stabilizes policies under action perturbations. Extensive experiments across diverse robotic environments demonstrate that RobustVLA significantly outperforms prior state-of-the-art methods in robustness and reliability. Our results highlight the importance of principled robustness-aware RL post-training as a key step toward improving the reliability and robustness of VLA models.

Related papers

Self-Correcting VLA: Online Action Refinement via Sparse World Imagination [55.982504915794514]
We propose Self-Correcting VLA (SC-VLA), which achieve self-improvement by intrinsically guiding action refinement through sparse imagination.<n>SC-VLA achieve state-of-the-art performance, yielding the highest task throughput with 16% fewer steps and a 9% higher success rate than the best-performing baselines.
arXiv Detail & Related papers (2026-02-25T06:58:06Z)
CRL-VLA: Continual Vision-Language-Action Learning [40.18167835795084]
Continual Reinforcement Learning is a promising pathway for deploying VLA models in lifelong robotic scenarios.<n>We introduce CRL-VLA, a framework for continual post-training of VLA models with rigorous theoretical bounds.<n>We derive a unified performance bound linking the stability-plasticity trade-off to goal-conditioned advantage magnitude, scaled by policy divergence.
arXiv Detail & Related papers (2026-02-03T12:09:53Z)
Learning to be Reproducible: Custom Loss Design for Robust Neural Networks [4.3094059981414405]
We propose a Custom Loss Function (CLF) that balances predictive accuracy with training stability.<n>CLF significantly improves training without sacrificing predictive performance.<n>These results establish CLF as an effective and efficient strategy for developing more stable, reliable and trustworthy neural networks.
arXiv Detail & Related papers (2026-01-02T05:31:08Z)
EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models [57.75717492488268]
Vision-Language-Action (VLA) models have advanced robotic manipulation by leveraging large language models.<n>Supervised Finetuning (SFT) requires hundreds of demonstrations per task, rigidly memorizing trajectories, and failing to adapt when deployment conditions deviate from training.<n>We introduce EVOLVE-VLA, a test-time training framework enabling VLAs to continuously adapt through environment interaction with minimal or zero task-specific demonstrations.
arXiv Detail & Related papers (2025-12-16T18:26:38Z)
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach [78.4812458793128]
We propose textbfTACO, a test-time-scaling framework that applies a lightweight pseudo-count estimator as a high-fidelity verifier of action chunks.<n>Our method resembles the classical anti-exploration principle in offline reinforcement learning (RL), and being gradient-free, it incurs significant computational benefits.
arXiv Detail & Related papers (2025-12-02T14:42:54Z)
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices [61.361819972410046]
We show why and under what conditions the true sequence-level reward can be optimized via a surrogate token-level objective in policy gradient methods such as REINFORCE.<n>This insight provides a principled explanation for the crucial role of several widely adopted techniques in stabilizing RL training.
arXiv Detail & Related papers (2025-12-01T07:45:39Z)
Human-in-the-loop Online Rejection Sampling for Robotic Manipulation [55.99788088622936]
Hi-ORS stabilizes value estimation by filtering out negatively rewarded samples during online fine-tuning.<n>Hi-ORS fine-tunes a pi-base policy to master contact-rich manipulation in just 1.5 hours of real-world training.
arXiv Detail & Related papers (2025-10-30T11:53:08Z)
Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models [33.214586668992965]
Reinforcement learning with verifiable rewards (RLVR) has delivered impressive gains in mathematical and multimodal reasoning.<n>We propose RECAP-a replay strategy with dynamic objective reweighting for general knowledge.<n>Our method is end-to-end and readily applicable to existing RLVR pipelines without training additional models or heavy tuning.
arXiv Detail & Related papers (2025-10-24T19:08:48Z)
VRPO: Rethinking Value Modeling for Robust RL Training under Noisy Supervision [29.848085169124605]
We show that a strong value model is essential for mitigating noise by absorbing unstable signals and enabling more reliable advantage estimation.<n>We propose VRPO, a value-centric framework for robust PPO training under noisy supervision.
arXiv Detail & Related papers (2025-08-05T04:05:15Z)
Enhancing Variational Autoencoders with Smooth Robust Latent Encoding [54.74721202894622]
Variational Autoencoders (VAEs) have played a key role in scaling up diffusion-based generative models.<n>We introduce Smooth Robust Latent VAE, a novel adversarial training framework that boosts both generation quality and robustness.<n>Experiments show that SRL-VAE improves both generation quality, in image reconstruction and text-guided image editing, and robustness, against Nightshade attacks and image editing attacks.
arXiv Detail & Related papers (2025-04-24T03:17:57Z)
SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning [2.7075926292355286]
We propose SALSA-RL (Stability Analysis in the Latent Space of Actions), a novel RL framework that models control actions as dynamic, time-dependent variables evolving within a latent space.<n>We demonstrate that SALSA-RL can be deployed in a non-invasive manner for assessing the local stability of actions from pretrained RL agents without compromising on performance across diverse benchmark environments.
arXiv Detail & Related papers (2025-02-21T15:09:39Z)
Improving Vision-Language-Action Model with Online Reinforcement Learning [17.043068379668842]
Recent studies have successfully integrated large vision-language models into low-level robotic control by supervised fine-tuning.<n>We propose iRe-VLA framework, which iterates between Reinforcement Learning and Supervised Learning to effectively improve VLA models.
arXiv Detail & Related papers (2025-01-28T02:53:48Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
Learn from the Past: A Proxy Guided Adversarial Defense Framework with Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models. AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting. We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
Robust Reinforcement Learning using Adversarial Populations [118.73193330231163]
Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness. We show that using a single adversary does not consistently yield robustness to dynamics variations under standard parametrizations of the adversary. We propose a population-based augmentation to the Robust RL formulation in which we randomly initialize a population of adversaries and sample from the population uniformly during training.
arXiv Detail & Related papers (2020-08-04T20:57:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.