RL-VLA$^3$: Reinforcement Learning VLA Accelerating via Full Asynchronism
- URL: http://arxiv.org/abs/2602.05765v1
- Date: Thu, 05 Feb 2026 15:30:23 GMT
- Title: RL-VLA$^3$: Reinforcement Learning VLA Accelerating via Full Asynchronism
- Authors: Zhong Guan, Haoran Sun, Yongjian Guo, Shuai Di, Xiaodong Bai, Jing Long, Tianyun Zhao, Mingxi Luo, Chen Zhou, Yucheng Guo, Qiming Yang, Wanting Xu, Wen Huang, Yunxuan Ma, Hongke Zhao, Likang Wu, Xiaotie Deng, Xi Xiao, Sheng Wen, Yicheng Gong, Junwu Xiong,
- Abstract summary: Vision-Language-Action (VLA) models have emerged as a crucial pathway towards general embodied intelligence.<n>This paper proposes and implements a fully-asynchronous policy training framework encompassing the entire pipeline from environment interaction to actor policy updates.<n>On the LIBERO benchmark, the framework achieves throughput improvements of up to 59.25% compared to existing synchronous strategies.
- Score: 42.27384804295299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, Vision-Language-Action (VLA) models have emerged as a crucial pathway towards general embodied intelligence, yet their training efficiency has become a key bottleneck. Although existing reinforcement learning (RL)-based training frameworks like RLinf can enhance model generalization, they still rely on synchronous execution, leading to severe resource underutilization and throughput limitations during environment interaction, policy generation (rollout), and model update phases (actor). To overcome this challenge, this paper, for the first time, proposes and implements a fully-asynchronous policy training framework encompassing the entire pipeline from environment interaction, rollout generation, to actor policy updates. Systematically drawing inspiration from asynchronous optimization ideas in large model RL, our framework designs a multi-level decoupled architecture. This includes asynchronous parallelization of environment interaction and trajectory collection, streaming execution for policy generation, and decoupled scheduling for training updates. We validated the effectiveness of our method across diverse VLA models and environments. On the LIBERO benchmark, the framework achieves throughput improvements of up to 59.25\% compared to existing synchronous strategies. When deeply optimizing separation strategies, throughput can be increased by as much as 126.67\%. We verified the effectiveness of each asynchronous component via ablation studies. Scaling law validation across 8 to 256 GPUs demonstrates our method's excellent scalability under most conditions.
Related papers
- DiRL: An Efficient Post-Training Framework for Diffusion Language Models [54.405206032785706]
Diffusion Language Models (dLLMs) have emerged as promising alternatives to Auto-Regressive (AR) models.<n>Existing methods suffer from computational inefficiency and objective mismatches between training and inference.<n>We introduce DiRL, an efficient post-training framework that tightly integrates FlexAttention-accelerated blockwise training with LMDeploy-optimized inference.
arXiv Detail & Related papers (2025-12-23T08:33:19Z) - Periodic Asynchrony: An Effective Method for Accelerating Reinforcement Learning [8.395046547177806]
reinforcement learning (RL) has attracted increasing attention, with growing efforts to reproduce and apply it.<n>In mainstream RL frameworks, inference and training are typically deployed on the same devices.<n>In this study, we are returning to the strategy of separating inference and training deployment.<n>We transform the conventional synchronous architecture into a periodically asynchronous framework, which allows for demand-driven, independent, and elastic scaling of each component.
arXiv Detail & Related papers (2025-11-24T08:22:50Z) - AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs [24.96730768606278]
We present AReaL-Hex, a heterogeneous-aware asynchronous RL training system.<n>It effectively schedules how to execute rollout generation and policy model training over heterogeneous GPUs.<n>It delivers up to 1.50x higher training throughput and 1.46x reduction in training cost.
arXiv Detail & Related papers (2025-11-02T04:17:30Z) - DRL: Discriminative Representation Learning with Parallel Adapters for Class Incremental Learning [63.65467569295623]
We propose the Discriminative Representation Learning (DRL) framework to specifically address these challenges.<n>To conduct incremental learning effectively and yet efficiently, the DRL's network is built upon a PTM.<n>Our DRL consistently outperforms other state-of-the-art methods throughout the entire CIL period.
arXiv Detail & Related papers (2025-10-14T03:19:15Z) - FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning [11.68914161151634]
Group relative policy optimization (GRPO) has demonstrated significant potential in improving the reasoning capabilities of large language models.<n>We propose a speculative decoding framework that adjusts the drafting and verification strategy according to real-time levels.<n>We show that the proposed method achieves end-to-end speedups of 2.35x to 2.72x, significantly surpassing baseline approaches in efficiency.
arXiv Detail & Related papers (2025-09-26T02:48:41Z) - DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions [6.723690093335988]
We propose a diffusion-based world model that generates future state-reward trajectories conditioned on the current state, action, and return-to-go.<n>We show that conservative offline RL algorithms such as TD3BC and IQL benefit significantly from training on these augmented trajectories.
arXiv Detail & Related papers (2025-09-23T20:06:26Z) - Adaptive Policy Synchronization for Scalable Reinforcement Learning [0.0]
ClusterEnv is a lightweight interface for distributed environment execution.<n>It supports both on- and off-policy methods, integrates into existing training code with minimal changes, and runs efficiently on clusters.
arXiv Detail & Related papers (2025-07-15T05:07:12Z) - AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training [24.60677187852425]
Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs)<n>Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks.<n>Task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance.<n>We propose AsyncFlow, an asynchronous streaming RL framework for efficient post-training.
arXiv Detail & Related papers (2025-07-02T12:45:34Z) - Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z) - StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation [55.75008325187133]
Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs)<n>StreamRL is designed with disaggregation from first principles to address two types of performance bottlenecks.<n> Experiments show that StreamRL improves throughput by up to 2.66x compared to existing state-of-the-art systems.
arXiv Detail & Related papers (2025-04-22T14:19:06Z) - Efficient Parallel Reinforcement Learning Framework using the Reactor
Model [2.190190313041532]
Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources.
Existing frameworks, such as Ray, are not managing this orchestration efficiently.
We have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern.
arXiv Detail & Related papers (2023-12-07T21:19:57Z) - An Efficient Asynchronous Method for Integrating Evolutionary and
Gradient-based Policy Search [76.73477450555046]
We introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL) that maximizes the parallel efficiency of ES and integrates it with policy gradient methods.
Specifically, we propose 1) a novel framework to merge ES and DRL asynchronously and 2) various asynchronous update methods that can take all advantages of asynchronism, ES, and DRL.
arXiv Detail & Related papers (2020-12-10T02:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.