Periodic Asynchrony: An Effective Method for Accelerating Reinforcement Learning
- URL: http://arxiv.org/abs/2511.18871v2
- Date: Mon, 01 Dec 2025 09:00:07 GMT
- Title: Periodic Asynchrony: An Effective Method for Accelerating Reinforcement Learning
- Authors: Jian Lu,
- Abstract summary: reinforcement learning (RL) has attracted increasing attention, with growing efforts to reproduce and apply it.<n>In mainstream RL frameworks, inference and training are typically deployed on the same devices.<n>In this study, we are returning to the strategy of separating inference and training deployment.<n>We transform the conventional synchronous architecture into a periodically asynchronous framework, which allows for demand-driven, independent, and elastic scaling of each component.
- Score: 8.395046547177806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since the introduction of the GRPO algorithm, reinforcement learning (RL) has attracted increasing attention, with growing efforts to reproduce and apply it. However, training efficiency remains a critical challenge. In mainstream RL frameworks, inference and training are typically deployed on the same devices. While this approach reduces costs through resource consolidation, its synchronous execution imposes a computational coupling that prevents concurrent inference and training. In this study, we are returning to the strategy of separating inference and training deployment, and by introducing improvements in the data loader, we transform the conventional synchronous architecture into a periodically asynchronous framework, which allows for demand-driven, independent, and elastic scaling of each component, while the accuracy of the algorithm remains completely equivalent to the synchronization method, with both belonging to the on-policy strategy. It is worth emphasizing that we apply a unified tri-model architecture in the training phase, and we also proposed a shared-prompt attention mask to reduce repetitive computation. In practice, these works have achieved at least a threefold overall performance improvement in RL training on NPU platforms, indicating its potential for widespread application.
Related papers
- GAC: Stabilizing Asynchronous RL Training for LLMs via Gradient Alignment Control [16.529035487811267]
We show that naively applying asynchrony to policy-gradient updates can induce qualitatively different training dynamics and lead to severe training instability.<n>We propose GRADIENT ALIGNMENT CONTROL, a simple dynamics-aware stabilization method that regulates asynchronous RL progress along stale-aligned directions.
arXiv Detail & Related papers (2026-03-02T06:19:43Z) - RL-VLA$^3$: Reinforcement Learning VLA Accelerating via Full Asynchronism [42.27384804295299]
Vision-Language-Action (VLA) models have emerged as a crucial pathway towards general embodied intelligence.<n>This paper proposes and implements a fully-asynchronous policy training framework encompassing the entire pipeline from environment interaction to actor policy updates.<n>On the LIBERO benchmark, the framework achieves throughput improvements of up to 59.25% compared to existing synchronous strategies.
arXiv Detail & Related papers (2026-02-05T15:30:23Z) - AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs [24.96730768606278]
We present AReaL-Hex, a heterogeneous-aware asynchronous RL training system.<n>It effectively schedules how to execute rollout generation and policy model training over heterogeneous GPUs.<n>It delivers up to 1.50x higher training throughput and 1.46x reduction in training cost.
arXiv Detail & Related papers (2025-11-02T04:17:30Z) - Asynchronous Policy Gradient Aggregation for Efficient Distributed Reinforcement Learning [55.50683337004406]
We introduce two new algorithms, Rennala NIGT and Malenia NIGT, which implement asynchronous policy gradient aggregation.<n>In the homogeneous setting, Rennala NIGT provably improves the total computational and communication complexity while supporting the AllReduce operation.<n>In the heterogeneous setting, Malenia NIGT simultaneously handles asynchronous computations and heterogeneous environments with strictly better theoretical guarantees.
arXiv Detail & Related papers (2025-09-29T05:38:42Z) - Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z) - RL-DAUNCE: Reinforcement Learning-Driven Data Assimilation with Uncertainty-Aware Constrained Ensembles [1.609702184777697]
We develop RL-DAUNCE, a new RL-based method that enhances data assimilation with physical constraints.<n>First, RL-DAUNCE inherits the computational efficiency of machine learning.<n>Second, RL-DAUNCE emphasizes uncertainty by advancing multiple ensemble members.<n>Third, RL-DAUNCE's ensemble-as-agents design facilitates the enforcement of physical constraints.
arXiv Detail & Related papers (2025-05-08T17:43:35Z) - From promise to practice: realizing high-performance decentralized training [8.955918346078935]
Decentralized training of deep neural networks has attracted significant attention for its theoretically superior scalability over synchronous data-parallel methods like All-Reduce.
This paper identifies three key factors that can lead to speedups over All-Reduce training and constructs a runtime model to determine when, how, and to what degree decentralization can yield shorter per-it runtimes.
arXiv Detail & Related papers (2024-10-15T19:04:56Z) - Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins.
We employ inverse RL (IRL) to automatically learn reward functions without manual tuning.
We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z) - Federated Learning based on Pruning and Recovery [0.0]
This framework integrates asynchronous learning algorithms and pruning techniques.
It addresses the inefficiencies of traditional federated learning algorithms in scenarios involving heterogeneous devices.
It also tackles the staleness issue and inadequate training of certain clients in asynchronous algorithms.
arXiv Detail & Related papers (2024-03-16T14:35:03Z) - Efficient Parallel Reinforcement Learning Framework using the Reactor
Model [2.190190313041532]
Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources.
Existing frameworks, such as Ray, are not managing this orchestration efficiently.
We have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern.
arXiv Detail & Related papers (2023-12-07T21:19:57Z) - Scheduling and Aggregation Design for Asynchronous Federated Learning
over Wireless Networks [56.91063444859008]
Federated Learning (FL) is a collaborative machine learning framework that combines on-device training and server-based aggregation.
We propose an asynchronous FL design with periodic aggregation to tackle the straggler issue in FL systems.
We show that an age-aware'' aggregation weighting design can significantly improve the learning performance in an asynchronous FL setting.
arXiv Detail & Related papers (2022-12-14T17:33:01Z) - Parallelized Reverse Curriculum Generation [62.25453821794469]
For reinforcement learning, it is challenging for an agent to master a task that requires a specific series of actions due to sparse rewards.
reverse curriculum generation (RCG) provides a reverse expansion approach that automatically generates a curriculum for the agent to learn.
We propose a parallelized approach that simultaneously trains multiple AC pairs and periodically exchanges their critics.
arXiv Detail & Related papers (2021-08-04T15:58:35Z) - An Efficient Asynchronous Method for Integrating Evolutionary and
Gradient-based Policy Search [76.73477450555046]
We introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL) that maximizes the parallel efficiency of ES and integrates it with policy gradient methods.
Specifically, we propose 1) a novel framework to merge ES and DRL asynchronously and 2) various asynchronous update methods that can take all advantages of asynchronism, ES, and DRL.
arXiv Detail & Related papers (2020-12-10T02:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.