Parallel Actors and Learners: A Framework for Generating Scalable RL
Implementations
- URL: http://arxiv.org/abs/2110.01101v1
- Date: Sun, 3 Oct 2021 21:00:53 GMT
- Title: Parallel Actors and Learners: A Framework for Generating Scalable RL
Implementations
- Authors: Chi Zhang, Sanmukh Rao Kuppannagari, Viktor K Prasanna
- Abstract summary: Reinforcement Learning (RL) has achieved significant success in application domains such as robotics, games, health care and others.
Current implementations exhibit poor performance due to challenges such as irregular memory accesses and synchronization overheads.
We propose a framework for generating scalable reinforcement learning implementations on multicore systems.
- Score: 14.432131909590824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning (RL) has achieved significant success in application
domains such as robotics, games, health care and others. However, training RL
agents is very time consuming. Current implementations exhibit poor performance
due to challenges such as irregular memory accesses and synchronization
overheads.
In this work, we propose a framework for generating scalable reinforcement
learning implementations on multicore systems. Replay Buffer is a key component
of RL algorithms which facilitates storage of samples obtained from
environmental interactions and their sampling for the learning process. We
define a new data structure for prioritized replay buffer based on $K$-ary sum
tree that supports asynchronous parallel insertions, sampling, and priority
updates. To address the challenge of irregular memory accesses, we propose a
novel data layout to store the nodes of the sum tree that reduces the number of
cache misses. Additionally, we propose \textit{lazy writing} mechanism to
reduce synchronization overheads of the replay buffer. Our framework employs
parallel actors to concurrently collect data via environmental interactions,
and parallel learners to perform stochastic gradient descent using the
collected data. Our framework supports a wide range of reinforcement learning
algorithms including DQN, DDPG, TD3, SAC, etc. We demonstrate the effectiveness
of our framework in accelerating RL algorithms by performing experiments on CPU
+ GPU platform using OpenAI benchmarks. Our results show that the performance
of our approach scales linearly with the number of cores. Compared with the
baseline approaches, we reduce the convergence time by 3.1x$\sim$10.8x. By
plugging our replay buffer implementation into existing open source
reinforcement learning frameworks, we achieve 1.1x$\sim$2.1x speedup for
sequential executions.
Related papers
- Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling [53.58854856174773]
Speculative decoding is an approach to accelerate inference through a guess-and-verify paradigm.
Token Recycling stores candidate tokens in an adjacency matrix and employs a breadth-first search algorithm.
It significantly outperforms existing train-free methods by 30% and even a training method by 25%.
arXiv Detail & Related papers (2024-08-16T12:20:56Z) - No Need to Look Back: An Efficient and Scalable Approach for Temporal
Network Representation Learning [9.218415145210715]
This paper introduces a novel efficient TGRL framework, No-Looking-Back (NLB)
NLB employs a "forward recent sampling" strategy, which bypasses the need for backtracking historical interactions.
Empirical evaluations demonstrate that NLB matches or surpasses state-of-the-art methods in accuracy for link prediction and node classification.
arXiv Detail & Related papers (2024-02-03T00:12:36Z) - Spreeze: High-Throughput Parallel Reinforcement Learning Framework [19.3019166138232]
Spreeze is a lightweight parallel framework for reinforcement learning.
It efficiently utilizes a single desktop hardware resource to approach the throughput limit.
It can achieve up to 15,000Hz experience sampling and 370,000Hz network update frame rate.
arXiv Detail & Related papers (2023-12-11T05:25:01Z) - Efficient Parallel Reinforcement Learning Framework using the Reactor
Model [2.190190313041532]
Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources.
Existing frameworks, such as Ray, are not managing this orchestration efficiently.
We have proposed a solution implementing the reactor model, which enforces a set of actors to have a fixed communication pattern.
arXiv Detail & Related papers (2023-12-07T21:19:57Z) - Retentive Network: A Successor to Transformer for Large Language Models [91.6652200825638]
We propose Retentive Network (RetNet) as a foundation architecture for large language models.
We theoretically derive the connection between recurrence and attention.
Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference.
arXiv Detail & Related papers (2023-07-17T16:40:01Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - FNAS: Uncertainty-Aware Fast Neural Architecture Search [54.49650267859032]
Reinforcement learning (RL)-based neural architecture search (NAS) generally guarantees better convergence yet suffers from the requirement of huge computational resources.
We propose a general pipeline to accelerate the convergence of the rollout process as well as the RL process in NAS.
Experiments on the Mobile Neural Architecture Search (MNAS) search space show the proposed Fast Neural Architecture Search (FNAS) accelerates standard RL-based NAS process by 10x.
arXiv Detail & Related papers (2021-05-25T06:32:52Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z) - On the Utility of Gradient Compression in Distributed Training Systems [9.017890174185872]
We evaluate the efficacy of gradient compression methods and compare their scalability with optimized implementations of synchronous data-parallel SGD.
Surprisingly, we observe that due to computation overheads introduced by gradient compression, the net speedup over vanilla data-parallel training is marginal, if not negative.
arXiv Detail & Related papers (2021-02-28T15:58:45Z) - Accurate, Efficient and Scalable Training of Graph Neural Networks [9.569918335816963]
Graph Neural Networks (GNNs) are powerful deep learning models to generate node embeddings on graphs.
It is still challenging to perform training in an efficient and scalable way.
We propose a novel parallel training framework that reduces training workload by orders of magnitude compared with state-of-the-art minibatch methods.
arXiv Detail & Related papers (2020-10-05T22:06:23Z) - Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of
Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models.
This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.