High-Throughput Synchronous Deep RL
- URL: http://arxiv.org/abs/2012.09849v1
- Date: Thu, 17 Dec 2020 18:59:01 GMT
- Title: High-Throughput Synchronous Deep RL
- Authors: Iou-Jen Liu and Raymond A. Yeh and Alexander G. Schwing
- Abstract summary: We propose High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL)
We perform learning and rollouts concurrently, devise a system design which avoids stale policies'
We evaluate our approach on Atari games and the Google Research Football environment.
- Score: 132.43861715707905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (RL) is computationally demanding and requires
processing of many data points. Synchronous methods enjoy training stability
while having lower data throughput. In contrast, asynchronous methods achieve
high throughput but suffer from stability issues and lower sample efficiency
due to `stale policies.' To combine the advantages of both methods we propose
High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL). In HTS-RL, we
perform learning and rollouts concurrently, devise a system design which avoids
`stale policies' and ensure that actors interact with environment replicas in
an asynchronous manner while maintaining full determinism. We evaluate our
approach on Atari games and the Google Research Football environment. Compared
to synchronous baselines, HTS-RL is 2-6$\times$ faster. Compared to
state-of-the-art asynchronous methods, HTS-RL has competitive throughput and
consistently achieves higher average episode rewards.
Related papers
- Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models [11.624678008637623]
We propose separating generation and learning in RLHF.
Asynchronous training relies on an underexplored regime, online but off-policy RLHF.
We study further compute optimizations for asynchronous RLHF but find that they come at a performance cost.
arXiv Detail & Related papers (2024-10-23T19:59:50Z) - Efficient Parallel Reinforcement Learning Framework using the Reactor
Model [2.190190313041532]
Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources.
Existing frameworks, such as Ray, are not managing this orchestration efficiently.
We have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern.
arXiv Detail & Related papers (2023-12-07T21:19:57Z) - A Quadratic Synchronization Rule for Distributed Deep Learning [66.68264684667562]
This work proposes a theory-grounded method for determining $H$, named the Quadratic Synchronization Rule (QSR)
Experiments on ResNet and ViT show that local gradient methods with QSR consistently improve the test accuracy over other synchronization strategies.
arXiv Detail & Related papers (2023-10-22T21:38:57Z) - Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations [98.5802673062712]
We introduce temporally-coupled perturbations, presenting a novel challenge for existing robust reinforcement learning methods.
We propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially observable two-player zero-sum game.
arXiv Detail & Related papers (2023-07-22T12:10:04Z) - Accelerating Distributed ML Training via Selective Synchronization [0.0]
textttSelSync is a practical, low-overhead method for DNN training that dynamically chooses to incur or avoid communication at each step.
Our system converges to the same or better accuracy than BSP while reducing training time by up to 14$times$.
arXiv Detail & Related papers (2023-07-16T05:28:59Z) - Offline Reinforcement Learning at Multiple Frequencies [62.08749079914275]
We study how well offline reinforcement learning algorithms can accommodate data with a mixture of frequencies during training.
We present a simple yet effective solution that enforces consistency in the rate of $Q$-value updates to stabilize learning.
arXiv Detail & Related papers (2022-07-26T17:54:49Z) - Hierarchical Reinforcement Learning with Optimal Level Synchronization
based on a Deep Generative Model [4.266866385061998]
One of the HRL issues is how to train each level policy with the optimal data collection from its experience.
We propose a novel HRL model supporting the optimal level synchronization using the off-policy correction technique with a deep generative model.
arXiv Detail & Related papers (2021-07-17T05:02:25Z) - Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep
Learning [10.196574441542646]
Gradient Descent (SGD) has become the de facto way to train deep neural networks in distributed clusters.
A critical factor in determining the training throughput and model accuracy is the choice of the parameter synchronization protocol.
In this paper, we design a hybrid synchronization approach that exploits the benefits of both BSP and ASP.
arXiv Detail & Related papers (2021-04-16T20:49:28Z) - Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear
Speedup [56.27526702716774]
This paper revisits the A3C algorithm with TD(0) for the critic update, termed A3C-TD(0), with provable convergence guarantees.
Under i.i.d. sampling, A3C-TD(0) obtains sample complexity of $mathcalO(epsilon-2.5/N)$ per worker to achieve $epsilon$ accuracy, where $N$ is the number of workers.
Compared to the best-known sample complexity of $mathcalO(epsilon-2.5/N)$ for two
arXiv Detail & Related papers (2020-12-31T09:07:09Z) - An Efficient Asynchronous Method for Integrating Evolutionary and
Gradient-based Policy Search [76.73477450555046]
We introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL) that maximizes the parallel efficiency of ES and integrates it with policy gradient methods.
Specifically, we propose 1) a novel framework to merge ES and DRL asynchronously and 2) various asynchronous update methods that can take all advantages of asynchronism, ES, and DRL.
arXiv Detail & Related papers (2020-12-10T02:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.