High-Throughput Distributed Reinforcement Learning via Adaptive Policy Synchronization
- URL: http://arxiv.org/abs/2507.10990v1
- Date: Tue, 15 Jul 2025 05:07:12 GMT
- Title: High-Throughput Distributed Reinforcement Learning via Adaptive Policy Synchronization
- Authors: Rodney Lafuente-Mercado,
- Abstract summary: ClusterEnv is a learner-agnostic interface for distributed environment execution that mirrors the Gymnasium API.<n>ClusterEnv introduces the DETACH pattern, which decouples simulation from training by offloading reset() and step() operations to remote workers while keeping learning centralized.<n>We propose Adaptive Actor Policy Synchronization (AAPS), a divergence-triggered update mechanism that reduces synchronization overhead without sacrificing performance.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scaling reinforcement learning (RL) workloads often requires distributing environment simulation across compute clusters. Existing frameworks entangle simulation, learning logic, and orchestration into monolithic systems, limiting modularity and reusability. We present ClusterEnv, a lightweight, learner-agnostic interface for distributed environment execution that mirrors the Gymnasium API. ClusterEnv introduces the DETACH pattern, which decouples simulation from training by offloading reset() and step() operations to remote workers while keeping learning centralized. To address policy staleness in distributed execution, we propose Adaptive Actor Policy Synchronization (AAPS), a divergence-triggered update mechanism that reduces synchronization overhead without sacrificing performance. ClusterEnv integrates cleanly into existing RL pipelines, supports both on-policy and off-policy methods, and requires minimal code changes. Experiments on discrete control tasks demonstrate that AAPS achieves high sample efficiency with significantly fewer weight updates. Source code is available at https://github.com/rodlaf/ClusterEnv.
Related papers
- Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms [4.127488674019288]
Post-training for large language models co-locates trajectory sampling and policy optimisation on the same GPU cluster.<n>We present Echo, the RL system that cleanly decouples these two phases across heterogeneous "inference" and "training" swarms.
arXiv Detail & Related papers (2025-08-07T13:37:04Z) - AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training [24.60677187852425]
Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs)<n>Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks.<n>Task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance.<n>We propose AsyncFlow, an asynchronous streaming RL framework for efficient post-training.
arXiv Detail & Related papers (2025-07-02T12:45:34Z) - StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation [55.75008325187133]
Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs)<n>StreamRL is designed with disaggregation from first principles to address two types of performance bottlenecks.<n> Experiments show that StreamRL improves throughput by up to 2.66x compared to existing state-of-the-art systems.
arXiv Detail & Related papers (2025-04-22T14:19:06Z) - OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters [1.4131700241686853]
We develop an adaptive batch-scaling framework called OmniLearn to mitigate the effects of heterogeneous resources.<n>Our approach is inspired by proportional controllers to balance across heterogeneous servers, and works under varying resource availability.
arXiv Detail & Related papers (2025-03-21T18:26:24Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.<n>In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.<n>We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Efficient Parallel Reinforcement Learning Framework using the Reactor
Model [2.190190313041532]
Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources.
Existing frameworks, such as Ray, are not managing this orchestration efficiently.
We have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern.
arXiv Detail & Related papers (2023-12-07T21:19:57Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Taming Resource Heterogeneity In Distributed ML Training With Dynamic
Batching [1.047192732651018]
Current techniques for distributed model training mostly assume that clusters are comprised of servers with a constant resource availability.
We develop a dynamic technique for distributed data-parallel training that adjusts the mini-batch sizes on each worker based on availability and throughput.
arXiv Detail & Related papers (2023-05-20T15:33:06Z) - Device Scheduling and Update Aggregation Policies for Asynchronous
Federated Learning [72.78668894576515]
Federated Learning (FL) is a newly emerged decentralized machine learning (ML) framework.
We propose an asynchronous FL framework with periodic aggregation to eliminate the straggler issue in FL systems.
arXiv Detail & Related papers (2021-07-23T18:57:08Z) - Online Deep Clustering for Unsupervised Representation Learning [108.33534231219464]
Online Deep Clustering (ODC) performs clustering and network update simultaneously rather than alternatingly.
We design and maintain two dynamic memory modules, i.e., samples memory to store samples labels and features, and centroids memory for centroids evolution.
In this way, labels and the network evolve shoulder-to-shoulder rather than alternatingly.
arXiv Detail & Related papers (2020-06-18T16:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.