Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under
Massively Parallel Simulation
- URL: http://arxiv.org/abs/2307.12983v1
- Date: Mon, 24 Jul 2023 17:59:37 GMT
- Title: Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under
Massively Parallel Simulation
- Authors: Zechu Li, Tao Chen, Zhang-Wei Hong, Anurag Ajay, Pulkit Agrawal
- Abstract summary: Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data.
Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU.
This paper presents a Parallel $Q$-Learning scheme that outperforms PPO in wall-clock time.
- Score: 17.827002299991285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning is time-consuming for complex tasks due to the need
for large amounts of training data. Recent advances in GPU-based simulation,
such as Isaac Gym, have sped up data collection thousands of times on a
commodity GPU. Most prior works used on-policy methods like PPO due to their
simplicity and ease of scaling. Off-policy methods are more data efficient but
challenging to scale, resulting in a longer wall-clock training time. This
paper presents a Parallel $Q$-Learning (PQL) scheme that outperforms PPO in
wall-clock time while maintaining superior sample efficiency of off-policy
learning. PQL achieves this by parallelizing data collection, policy learning,
and value learning. Different from prior works on distributed off-policy
learning, such as Apex, our scheme is designed specifically for massively
parallel GPU-based simulation and optimized to work on a single workstation. In
experiments, we demonstrate that $Q$-learning can be scaled to \textit{tens of
thousands of parallel environments} and investigate important factors affecting
learning speed. The code is available at https://github.com/Improbable-AI/pql.
Related papers
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - SAPG: Split and Aggregate Policy Gradients [37.433915947580076]
We propose a new on-policy RL algorithm that can effectively leverage large-scale environments by splitting them into chunks and fusing them back together via importance sampling.
Our algorithm, termed SAPG, shows significantly higher performance across a variety of challenging environments where vanilla PPO and other strong baselines fail to achieve high performance.
arXiv Detail & Related papers (2024-07-29T17:59:50Z) - Automatic Task Parallelization of Dataflow Graphs in ML/DL models [0.0]
We present a Linear Clustering approach to exploit inherent parallel paths in ML dataflow graphs.
We generate readable and executable parallel Pytorch+Python code from input ML models in ONNX format.
Preliminary results on several ML graphs demonstrate up to 1.9$times$ speedup over serial execution.
arXiv Detail & Related papers (2023-08-22T04:54:30Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - PARTIME: Scalable and Parallel Processing Over Time with Deep Neural
Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time.
PARTIME starts processing each data sample at the time in which it becomes available from the stream.
Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z) - A Simulation Platform for Multi-tenant Machine Learning Services on
Thousands of GPUs [38.92672037891692]
AnalySIM is a cluster simulator that allows efficient design explorations for multi-tenant machine learning services.
It can easily test and analyze various scheduling policies in a number of performance metrics such as GPU resource utilization.
We find that preemption and migration are able to significantly reduce average job completion time.
arXiv Detail & Related papers (2022-01-10T06:00:11Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Efficient Device Scheduling with Multi-Job Federated Learning [64.21733164243781]
We propose a novel multi-job Federated Learning framework to enable the parallel training process of multiple jobs.
We propose a reinforcement learning-based method and a Bayesian optimization-based method to schedule devices for multiple jobs while minimizing the cost.
Our proposed approaches significantly outperform baseline approaches in terms of training time (up to 8.67 times faster) and accuracy (up to 44.6% higher)
arXiv Detail & Related papers (2021-12-11T08:05:11Z) - WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement
Learning on a GPU [15.337470862838794]
We present WarpDrive, a flexible, lightweight, and easy-to-use open-source RL framework that implements end-to-end multi-agent RL on a single GPU.
Our design runs simulations and the agents in each simulation in parallel. It also uses a single simulation data store on the GPU that is safely updated in-place.
WarpDrive yields 2.9 million environment steps/second with 2000 environments and 1000 agents (at least 100x higher throughput compared to a CPU implementation) in a benchmark Tag simulation.
arXiv Detail & Related papers (2021-08-31T16:59:27Z) - Large Batch Simulation for Deep Reinforcement Learning [101.01408262583378]
We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work.
We realize end-to-end training speeds of over 19,000 frames of experience per second on a single and up to 72,000 frames per second on a single eight- GPU machine.
By combining batch simulation and performance optimizations, we demonstrate that Point navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system.
arXiv Detail & Related papers (2021-03-12T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.