MSRL: Distributed Reinforcement Learning with Dataflow Fragments
- URL: http://arxiv.org/abs/2210.00882v1
- Date: Mon, 3 Oct 2022 12:34:58 GMT
- Title: MSRL: Distributed Reinforcement Learning with Dataflow Fragments
- Authors: Huanzhou Zhu, Bo Zhao, Gang Chen, Weifeng Chen, Yijie Chen, Liang Shi,
Peter Pietzuch and Lei Chen
- Abstract summary: Reinforcement learning (RL) trains many agents, which is resource-intensive and must scale to large GPU clusters.
We describe MindSpore Reinforcement Learning (MSRL), a distributed RL training system that supports distribution policies that govern how RL training is parallelised and distributed on cluster resources.
MSRL introduces the new abstraction of a fragmented dataflow graph, which maps functions from an RL algorithm's training loop to parallel computational fragments.
- Score: 16.867322708270116
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning~(RL) trains many agents, which is resource-intensive
and must scale to large GPU clusters. Different RL training algorithms offer
different opportunities for distributing and parallelising the computation.
Yet, current distributed RL systems tie the definition of RL algorithms to
their distributed execution: they hard-code particular distribution strategies
and only accelerate specific parts of the computation (e.g. policy network
updates) on GPU workers. Fundamentally, current systems lack abstractions that
decouple RL algorithms from their execution.
We describe MindSpore Reinforcement Learning (MSRL), a distributed RL
training system that supports distribution policies that govern how RL training
computation is parallelised and distributed on cluster resources, without
requiring changes to the algorithm implementation. MSRL introduces the new
abstraction of a fragmented dataflow graph, which maps Python functions from an
RL algorithm's training loop to parallel computational fragments. Fragments are
executed on different devices by translating them to low-level dataflow
representations, e.g. computational graphs as supported by deep learning
engines, CUDA implementations or multi-threaded CPU processes. We show that
MSRL subsumes the distribution strategies of existing systems, while scaling RL
training to 64 GPUs.
Related papers
- ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - Efficient Parallel Reinforcement Learning Framework using the Reactor
Model [2.190190313041532]
Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources.
Existing frameworks, such as Ray, are not managing this orchestration efficiently.
We have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern.
arXiv Detail & Related papers (2023-12-07T21:19:57Z) - SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores [13.948640763797776]
We present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework.
We develop a scalable, efficient, and distributed RL system called ReaLly scalableRL, which allows efficient and massively parallelized training.
SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores.
arXiv Detail & Related papers (2023-06-29T05:16:25Z) - RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL.
We show that RL$3$ earns greater cumulative reward in the long term, compared to RL$2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Transferred Q-learning [79.79659145328856]
We consider $Q$-learning with knowledge transfer, using samples from a target reinforcement learning (RL) task as well as source samples from different but related RL tasks.
We propose transfer learning algorithms for both batch and online $Q$-learning with offline source studies.
arXiv Detail & Related papers (2022-02-09T20:08:19Z) - ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep
Reinforcement Learning [141.58588761593955]
We present a library ElegantRL-podracer for cloud-native deep reinforcement learning.
It efficiently supports millions of cores to carry out massively parallel training at multiple levels.
At a low-level, each pod simulates agent-environment interactions in parallel by fully utilizing nearly 7,000 GPU cores in a single GPU.
arXiv Detail & Related papers (2021-12-11T06:31:21Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z) - RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning
Workloads [4.575381867242508]
We propose RL-Scope, a cross-stack profiler that scopes low-level CPU/GPU resource usage to high-level algorithmic operations.
We demonstrate RL-Scope's utility through in-depth case studies.
arXiv Detail & Related papers (2021-02-08T15:42:48Z) - RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem [37.38316954355031]
We re-examine the challenges posed by distributed reinforcement learning.
We show that viewing RL as a dataflow problem leads to highly composable and performant implementations.
We propose RLlib Flow, a hybrid actor-dataflow programming model for distributed RL.
arXiv Detail & Related papers (2020-11-25T13:28:16Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.