Related papers: RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem

RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem

URL: http://arxiv.org/abs/2011.12719v4
Date: Thu, 28 Oct 2021 18:31:07 GMT
Title: RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem
Authors: Eric Liang, Zhanghao Wu, Michael Luo, Sven Mika, Joseph E. Gonzalez, Ion Stoica
Abstract summary: We re-examine the challenges posed by distributed reinforcement learning. We show that viewing RL as a dataflow problem leads to highly composable and performant implementations. We propose RLlib Flow, a hybrid actor-dataflow programming model for distributed RL.
Score: 37.38316954355031
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Researchers and practitioners in the field of reinforcement learning (RL) frequently leverage parallel computation, which has led to a plethora of new algorithms and systems in the last few years. In this paper, we re-examine the challenges posed by distributed RL and try to view it through the lens of an old idea: distributed dataflow. We show that viewing RL as a dataflow problem leads to highly composable and performant implementations. We propose RLlib Flow, a hybrid actor-dataflow programming model for distributed RL, and validate its practicality by porting the full suite of algorithms in RLlib, a widely adopted distributed RL library. Concretely, RLlib Flow provides 2-9 code savings in real production code and enables the composition of multi-agent algorithms not possible by end users before. The open-source code is available as part of RLlib at https://github.com/ray-project/ray/tree/master/rllib.

Related papers

StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation [55.75008325187133]
Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs) StreamRL is designed with disaggregation from first principles to address two types of performance bottlenecks. Experiments show that StreamRL improves throughput by up to 2.66x compared to existing state-of-the-art systems.
arXiv Detail & Related papers (2025-04-22T14:19:06Z)
RLHF Workflow: From Reward Modeling to Online RLHF [79.83927049253924]
We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report. RLHF is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature. We show that supervised fine-tuning (SFT) and iterative RLHF can obtain state-of-the-art performance with fully open-source datasets.
arXiv Detail & Related papers (2024-05-13T15:50:39Z)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models. Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z)
SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores [13.948640763797776]
We present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework. We develop a scalable, efficient, and distributed RL system called ReaLly scalableRL, which allows efficient and massively parallelized training. SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores.
arXiv Detail & Related papers (2023-06-29T05:16:25Z)
RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL. We show that RL$3$ earns greater cumulative reward in the long term, compared to RL$2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z)
MSRL: Distributed Reinforcement Learning with Dataflow Fragments [16.867322708270116]
Reinforcement learning (RL) trains many agents, which is resource-intensive and must scale to large GPU clusters. We describe MindSpore Reinforcement Learning (MSRL), a distributed RL training system that supports distribution policies that govern how RL training is parallelised and distributed on cluster resources. MSRL introduces the new abstraction of a fragmented dataflow graph, which maps functions from an RL algorithm's training loop to parallel computational fragments.
arXiv Detail & Related papers (2022-10-03T12:34:58Z)
LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning [78.2286146954051]
LCRL implements model-free Reinforcement Learning (RL) algorithms over unknown Decision Processes (MDPs) We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL.
arXiv Detail & Related papers (2022-09-21T13:21:00Z)
ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives [11.675763847424786]
We present ShinRL, an open-source library for evaluation of reinforcement learning (RL) algorithms. ShinRL provides an RL environment interface that can compute metrics for delving into the behaviors of RL algorithms. We show how combining these two features of ShinRL makes it easier to analyze the behavior of deep Q learning.
arXiv Detail & Related papers (2021-12-08T05:34:46Z)
Recurrent Model-Free RL is a Strong Baseline for Many POMDPs [73.39666827525782]
Many problems in RL, such as meta RL, robust RL, and generalization in RL, can be cast as POMDPs. In theory, simply augmenting model-free RL with memory, such as recurrent neural networks, provides a general approach to solving all types of POMDPs. Prior work has found that such recurrent model-free RL methods tend to perform worse than more specialized algorithms that are designed for specific types of POMDPs.
arXiv Detail & Related papers (2021-10-11T07:09:14Z)
RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads [4.575381867242508]
We propose RL-Scope, a cross-stack profiler that scopes low-level CPU/GPU resource usage to high-level algorithmic operations. We demonstrate RL-Scope's utility through in-depth case studies.
arXiv Detail & Related papers (2021-02-08T15:42:48Z)
MushroomRL: Simplifying Reinforcement Learning Research [60.70556446270147]
MushroomRL is an open-source Python library developed to simplify the process of implementing and running Reinforcement Learning (RL) experiments. Compared to other available libraries, MushroomRL has been created with the purpose of providing a comprehensive and flexible framework to minimize the effort in implementing and testing novel RL methodologies.
arXiv Detail & Related papers (2020-01-04T17:23:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.