Related papers: High-Throughput Synchronous Deep RL

High-Throughput Synchronous Deep RL

URL: http://arxiv.org/abs/2012.09849v1
Date: Thu, 17 Dec 2020 18:59:01 GMT
Title: High-Throughput Synchronous Deep RL
Authors: Iou-Jen Liu and Raymond A. Yeh and Alexander G. Schwing
Abstract summary: We propose High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL) We perform learning and rollouts concurrently, devise a system design which avoids stale policies' We evaluate our approach on Atari games and the Google Research Football environment.
Score: 132.43861715707905
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning (RL) is computationally demanding and requires processing of many data points. Synchronous methods enjoy training stability while having lower data throughput. In contrast, asynchronous methods achieve high throughput but suffer from stability issues and lower sample efficiency due to `stale policies.' To combine the advantages of both methods we propose High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL). In HTS-RL, we perform learning and rollouts concurrently, devise a system design which avoids `stale policies' and ensure that actors interact with environment replicas in an asynchronous manner while maintaining full determinism. We evaluate our approach on Atari games and the Google Research Football environment. Compared to synchronous baselines, HTS-RL is 2-6$\times$ faster. Compared to state-of-the-art asynchronous methods, HTS-RL has competitive throughput and consistently achieves higher average episode rewards.

Related papers

StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation [55.75008325187133]
Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs) StreamRL is designed with disaggregation from first principles to address two types of performance bottlenecks. Experiments show that StreamRL improves throughput by up to 2.66x compared to existing state-of-the-art systems.
arXiv Detail & Related papers (2025-04-22T14:19:06Z)
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training [71.16258800411696]
Reinforcement learning (RL) is a critical component of large language model (LLM) post-training. Existing on-policy algorithms used for post-training are inherently incompatible with the use of experience replay buffers. We propose efficiently obtaining this benefit of replay buffers via Trajectory Balance with Asynchrony (TBA)
arXiv Detail & Related papers (2025-03-24T17:51:39Z)
Synchronous vs Asynchronous Reinforcement Learning in a Real World Robot [0.0]
reinforcement learning (RL) agents learn by periodically conducting computationally expensive gradient updates. In a rapidly changing environment, increased response time may be detrimental to the performance of the learning agent. Asynchronous RL methods separate the computation of decision-making and gradient updates. Our experiments show that the agents learn faster and attain significantly more returns using asynchronous RL.
arXiv Detail & Related papers (2025-03-17T22:24:39Z)
Highly Parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies [10.18115392275147]
We propose a high- throughput distributed Deep Reinforcement Learning training system called TianJi. It relaxes assignment dependencies between subtask components and enables event-driven asynchronous communication. TianJi achieves a convergence time acceleration ratio of up to 4.37 compared to related comparison systems.
arXiv Detail & Related papers (2025-02-27T15:23:43Z)
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models [11.624678008637623]
We propose separating generation and learning in RLHF. Asynchronous training relies on an underexplored regime, online but off-policy RLHF. We study further compute optimizations for asynchronous RLHF but find that they come at a performance cost.
arXiv Detail & Related papers (2024-10-23T19:59:50Z)
Efficient Parallel Reinforcement Learning Framework using the Reactor Model [2.190190313041532]
Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources. Existing frameworks, such as Ray, are not managing this orchestration efficiently. We have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern.
arXiv Detail & Related papers (2023-12-07T21:19:57Z)
A Quadratic Synchronization Rule for Distributed Deep Learning [66.68264684667562]
This work proposes a theory-grounded method for determining $H$, named the Quadratic Synchronization Rule (QSR) Experiments on ResNet and ViT show that local gradient methods with QSR consistently improve the test accuracy over other synchronization strategies.
arXiv Detail & Related papers (2023-10-22T21:38:57Z)
Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations [98.5802673062712]
We introduce temporally-coupled perturbations, presenting a novel challenge for existing robust reinforcement learning methods. We propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially observable two-player zero-sum game.
arXiv Detail & Related papers (2023-07-22T12:10:04Z)
Accelerating Distributed ML Training via Selective Synchronization [0.0]
textttSelSync is a practical, low-overhead method for DNN training that dynamically chooses to incur or avoid communication at each step. Our system converges to the same or better accuracy than BSP while reducing training time by up to 14$times$.
arXiv Detail & Related papers (2023-07-16T05:28:59Z)
Offline Reinforcement Learning at Multiple Frequencies [62.08749079914275]
We study how well offline reinforcement learning algorithms can accommodate data with a mixture of frequencies during training. We present a simple yet effective solution that enforces consistency in the rate of $Q$-value updates to stabilize learning.
arXiv Detail & Related papers (2022-07-26T17:54:49Z)
Hierarchical Reinforcement Learning with Optimal Level Synchronization based on a Deep Generative Model [4.266866385061998]
One of the HRL issues is how to train each level policy with the optimal data collection from its experience. We propose a novel HRL model supporting the optimal level synchronization using the off-policy correction technique with a deep generative model.
arXiv Detail & Related papers (2021-07-17T05:02:25Z)
Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning [10.196574441542646]
Gradient Descent (SGD) has become the de facto way to train deep neural networks in distributed clusters. A critical factor in determining the training throughput and model accuracy is the choice of the parameter synchronization protocol. In this paper, we design a hybrid synchronization approach that exploits the benefits of both BSP and ASP.
arXiv Detail & Related papers (2021-04-16T20:49:28Z)
Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup [56.27526702716774]
This paper revisits the A3C algorithm with TD(0) for the critic update, termed A3C-TD(0), with provable convergence guarantees. Under i.i.d. sampling, A3C-TD(0) obtains sample complexity of $mathcalO(epsilon-2.5/N)$ per worker to achieve $epsilon$ accuracy, where $N$ is the number of workers. Compared to the best-known sample complexity of $mathcalO(epsilon-2.5/N)$ for two
arXiv Detail & Related papers (2020-12-31T09:07:09Z)
An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search [76.73477450555046]
We introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL) that maximizes the parallel efficiency of ES and integrates it with policy gradient methods. Specifically, we propose 1) a novel framework to merge ES and DRL asynchronously and 2) various asynchronous update methods that can take all advantages of asynchronism, ES, and DRL.
arXiv Detail & Related papers (2020-12-10T02:30:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.