Efficient Parallel Reinforcement Learning Framework using the Reactor
Model
- URL: http://arxiv.org/abs/2312.04704v2
- Date: Fri, 2 Feb 2024 19:41:29 GMT
- Title: Efficient Parallel Reinforcement Learning Framework using the Reactor
Model
- Authors: Jacky Kwok, Marten Lohstroh, Edward A. Lee
- Abstract summary: Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources.
Existing frameworks, such as Ray, are not managing this orchestration efficiently.
We have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern.
- Score: 2.190190313041532
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parallel Reinforcement Learning (RL) frameworks are essential for mapping RL
workloads to multiple computational resources, allowing for faster generation
of samples, estimation of values, and policy improvement. These computational
paradigms require a seamless integration of training, serving, and simulation
workloads. Existing frameworks, such as Ray, are not managing this
orchestration efficiently, especially in RL tasks that demand intensive
input/output and synchronization between actors on a single node. In this
study, we have proposed a solution implementing the reactor model, which
enforces a set of actors to have a fixed communication pattern. This allows the
scheduler to eliminate work needed for synchronization, such as acquiring and
releasing locks for each actor or sending and processing coordination-related
messages. Our framework, Lingua Franca (LF), a coordination language based on
the reactor model, also supports true parallelism in Python and provides a
unified interface that allows users to automatically generate dataflow graphs
for RL tasks. In comparison to Ray on a single-node multi-core compute
platform, LF achieves 1.21x and 11.62x higher simulation throughput in OpenAI
Gym and Atari environments, reduces the average training time of synchronized
parallel Q-learning by 31.2%, and accelerates multi-agent RL inference by
5.12x.
Related papers
- Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models [11.624678008637623]
We propose separating generation and learning in RLHF.
Asynchronous training relies on an underexplored regime, online but off-policy RLHF.
We study further compute optimizations for asynchronous RLHF but find that they come at a performance cost.
arXiv Detail & Related papers (2024-10-23T19:59:50Z) - HybridFlow: A Flexible and Efficient RLHF Framework [13.80577212781375]
Reinforcement Learning from Human Feedback is widely used in Large Language Model (LLM) alignment.
Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN)
We propose HybridFlow, which combines single-controller and multi-controller paradigms in a hybrid manner to enable flexible representation and efficient execution of the RLHF dataflow.
arXiv Detail & Related papers (2024-09-28T06:20:03Z) - Spreeze: High-Throughput Parallel Reinforcement Learning Framework [19.3019166138232]
Spreeze is a lightweight parallel framework for reinforcement learning.
It efficiently utilizes a single desktop hardware resource to approach the throughput limit.
It can achieve up to 15,000Hz experience sampling and 370,000Hz network update frame rate.
arXiv Detail & Related papers (2023-12-11T05:25:01Z) - Retentive Network: A Successor to Transformer for Large Language Models [91.6652200825638]
We propose Retentive Network (RetNet) as a foundation architecture for large language models.
We theoretically derive the connection between recurrence and attention.
Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference.
arXiv Detail & Related papers (2023-07-17T16:40:01Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - SWARM Parallelism: Training Large Models Can Be Surprisingly
Communication-Efficient [69.61083127540776]
Deep learning applications benefit from using large models with billions of parameters.
Training these models is notoriously expensive due to the need for specialized HPC clusters.
We consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions.
arXiv Detail & Related papers (2023-01-27T18:55:19Z) - Flexible Parallel Learning in Edge Scenarios: Communication,
Computational and Energy Cost [20.508003076947848]
Fog- and IoT-based scenarios often require combining both approaches.
We present a framework for flexible parallel learning (FPL), achieving both data and model parallelism.
Our experiments, carried out using state-of-the-art deep-network architectures and large-scale datasets, confirm that FPL allows for an excellent trade-off among computational (hence energy) cost, communication overhead, and learning performance.
arXiv Detail & Related papers (2022-01-19T03:47:04Z) - MALib: A Parallel Framework for Population-based Multi-agent
Reinforcement Learning [61.28547338576706]
Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms.
We present MALib, a scalable and efficient computing framework for PB-MARL.
arXiv Detail & Related papers (2021-06-05T03:27:08Z) - TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale
Language Models [60.23234205219347]
TeraPipe is a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models.
We show that TeraPipe can speed up the training by 5.0x for the largest GPT-3 model with 175 billion parameters on an AWS cluster.
arXiv Detail & Related papers (2021-02-16T07:34:32Z) - Parallel Training of Deep Networks with Local Updates [84.30918922367442]
Local parallelism is a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation.
We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.
arXiv Detail & Related papers (2020-12-07T16:38:45Z) - Restructuring, Pruning, and Adjustment of Deep Models for Parallel
Distributed Inference [15.720414948573753]
We consider the parallel implementation of an already-trained deep model on multiple processing nodes (a.k.a. workers)
We propose RePurpose, a layer-wise model restructuring and pruning technique that guarantees the performance of the overall parallelized model.
We show that, compared to the existing methods, RePurpose significantly improves the efficiency of the distributed inference via parallel implementation.
arXiv Detail & Related papers (2020-08-19T06:44:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.