Related papers: Efficient Parallel Reinforcement Learning Framework using the Reactor Model

Efficient Parallel Reinforcement Learning Framework using the Reactor Model

URL: http://arxiv.org/abs/2312.04704v2
Date: Fri, 2 Feb 2024 19:41:29 GMT
Title: Efficient Parallel Reinforcement Learning Framework using the Reactor Model
Authors: Jacky Kwok, Marten Lohstroh, Edward A. Lee
Abstract summary: Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources. Existing frameworks, such as Ray, are not managing this orchestration efficiently. We have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern.
Score: 2.190190313041532
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Parallel Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources, allowing for faster generation of samples, estimation of values, and policy improvement. These computational paradigms require a seamless integration of training, serving, and simulation workloads. Existing frameworks, such as Ray, are not managing this orchestration efficiently, especially in RL tasks that demand intensive input/output and synchronization between actors on a single node. In this study, we have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern. This allows the scheduler to eliminate work needed for synchronization, such as acquiring and releasing locks for each actor or sending and processing coordination-related messages. Our framework, Lingua Franca (LF), a coordination language based on the reactor model, also supports true parallelism in Python and provides a unified interface that allows users to automatically generate dataflow graphs for RL tasks. In comparison to Ray on a single-node multi-core compute platform, LF achieves 1.21x and 11.62x higher simulation throughput in OpenAI Gym and Atari environments, reduces the average training time of synchronized parallel Q-learning by 31.2%, and accelerates multi-agent RL inference by 5.12x.

Related papers

Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms [4.127488674019288]
Post-training for large language models co-locates trajectory sampling and policy optimisation on the same GPU cluster.<n>We present Echo, the RL system that cleanly decouples these two phases across heterogeneous "inference" and "training" swarms.
arXiv Detail & Related papers (2025-08-07T13:37:04Z)
High-Throughput Distributed Reinforcement Learning via Adaptive Policy Synchronization [0.0]
ClusterEnv is a learner-agnostic interface for distributed environment execution that mirrors the Gymnasium API.<n>ClusterEnv introduces the DETACH pattern, which decouples simulation from training by offloading reset() and step() operations to remote workers while keeping learning centralized.<n>We propose Adaptive Actor Policy Synchronization (AAPS), a divergence-triggered update mechanism that reduces synchronization overhead without sacrificing performance.
arXiv Detail & Related papers (2025-07-15T05:07:12Z)
AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training [24.60677187852425]
Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs)<n>Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks.<n>Task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance.<n>We propose AsyncFlow, an asynchronous streaming RL framework for efficient post-training.
arXiv Detail & Related papers (2025-07-02T12:45:34Z)
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z)
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation [55.75008325187133]
Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs) StreamRL is designed with disaggregation from first principles to address two types of performance bottlenecks. Experiments show that StreamRL improves throughput by up to 2.66x compared to existing state-of-the-art systems.
arXiv Detail & Related papers (2025-04-22T14:19:06Z)
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models [11.624678008637623]
We propose separating generation and learning in RLHF. Asynchronous training relies on an underexplored regime, online but off-policy RLHF. We study further compute optimizations for asynchronous RLHF but find that they come at a performance cost.
arXiv Detail & Related papers (2024-10-23T19:59:50Z)
HybridFlow: A Flexible and Efficient RLHF Framework [13.80577212781375]
Reinforcement Learning from Human Feedback is widely used in Large Language Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) We propose HybridFlow, which combines single-controller and multi-controller paradigms in a hybrid manner to enable flexible representation and efficient execution of the RLHF dataflow.
arXiv Detail & Related papers (2024-09-28T06:20:03Z)
Spreeze: High-Throughput Parallel Reinforcement Learning Framework [19.3019166138232]
Spreeze is a lightweight parallel framework for reinforcement learning. It efficiently utilizes a single desktop hardware resource to approach the throughput limit. It can achieve up to 15,000Hz experience sampling and 370,000Hz network update frame rate.
arXiv Detail & Related papers (2023-12-11T05:25:01Z)
Retentive Network: A Successor to Transformer for Large Language Models [91.6652200825638]
We propose Retentive Network (RetNet) as a foundation architecture for large language models. We theoretically derive the connection between recurrence and attention. Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference.
arXiv Detail & Related papers (2023-07-17T16:40:01Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient [69.61083127540776]
Deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. We consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions.
arXiv Detail & Related papers (2023-01-27T18:55:19Z)
Flexible Parallel Learning in Edge Scenarios: Communication, Computational and Energy Cost [20.508003076947848]
Fog- and IoT-based scenarios often require combining both approaches. We present a framework for flexible parallel learning (FPL), achieving both data and model parallelism. Our experiments, carried out using state-of-the-art deep-network architectures and large-scale datasets, confirm that FPL allows for an excellent trade-off among computational (hence energy) cost, communication overhead, and learning performance.
arXiv Detail & Related papers (2022-01-19T03:47:04Z)
MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning [61.28547338576706]
Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms. We present MALib, a scalable and efficient computing framework for PB-MARL.
arXiv Detail & Related papers (2021-06-05T03:27:08Z)
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models [60.23234205219347]
TeraPipe is a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models. We show that TeraPipe can speed up the training by 5.0x for the largest GPT-3 model with 175 billion parameters on an AWS cluster.
arXiv Detail & Related papers (2021-02-16T07:34:32Z)
Parallel Training of Deep Networks with Local Updates [84.30918922367442]
Local parallelism is a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation. We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.
arXiv Detail & Related papers (2020-12-07T16:38:45Z)
Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference [15.720414948573753]
We consider the parallel implementation of an already-trained deep model on multiple processing nodes (a.k.a. workers) We propose RePurpose, a layer-wise model restructuring and pruning technique that guarantees the performance of the overall parallelized model. We show that, compared to the existing methods, RePurpose significantly improves the efficiency of the distributed inference via parallel implementation.
arXiv Detail & Related papers (2020-08-19T06:44:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.