WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement
Learning on a GPU
- URL: http://arxiv.org/abs/2108.13976v1
- Date: Tue, 31 Aug 2021 16:59:27 GMT
- Title: WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement
Learning on a GPU
- Authors: Tian Lan, Sunil Srinivasa, Stephan Zheng
- Abstract summary: We present WarpDrive, a flexible, lightweight, and easy-to-use open-source RL framework that implements end-to-end multi-agent RL on a single GPU.
Our design runs simulations and the agents in each simulation in parallel. It also uses a single simulation data store on the GPU that is safely updated in-place.
WarpDrive yields 2.9 million environment steps/second with 2000 environments and 1000 agents (at least 100x higher throughput compared to a CPU implementation) in a benchmark Tag simulation.
- Score: 15.337470862838794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (RL) is a powerful framework to train
decision-making models in complex dynamical environments. However, RL can be
slow as it learns through repeated interaction with a simulation of the
environment. Accelerating RL requires both algorithmic and engineering
innovations. In particular, there are key systems engineering bottlenecks when
using RL in complex environments that feature multiple agents or
high-dimensional state, observation, or action spaces, for example. We present
WarpDrive, a flexible, lightweight, and easy-to-use open-source RL framework
that implements end-to-end multi-agent RL on a single GPU (Graphics Processing
Unit), building on PyCUDA and PyTorch. Using the extreme parallelization
capability of GPUs, WarpDrive enables orders-of-magnitude faster RL compared to
common implementations that blend CPU simulations and GPU models. Our design
runs simulations and the agents in each simulation in parallel. It eliminates
data copying between CPU and GPU. It also uses a single simulation data store
on the GPU that is safely updated in-place. Together, this allows the user to
run thousands of concurrent multi-agent simulations and train on extremely
large batches of experience. For example, WarpDrive yields 2.9 million
environment steps/second with 2000 environments and 1000 agents (at least 100x
higher throughput compared to a CPU implementation) in a benchmark Tag
simulation. WarpDrive provides a lightweight Python interface and environment
wrappers to simplify usage and promote flexibility and extensions. As such,
WarpDrive provides a framework for building high-throughput RL systems.
Related papers
- GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS [4.172988187048097]
GPUDrive is a GPU-accelerated, multi-agent simulator built on top of the Madrona Engine Game.
We show that using GPUDrive we can effectively train reinforcement learning agents over many scenes in the Open Motion dataset.
arXiv Detail & Related papers (2024-08-02T21:37:46Z) - JaxMARL: Multi-Agent RL Environments and Algorithms in JAX [105.343918678781]
We present JaxMARL, the first open-source, Python-based library that combines GPU-enabled efficiency with support for a large number of commonly used MARL environments.
Our experiments show that, in terms of wall clock time, our JAX-based training pipeline is around 14 times faster than existing approaches.
We also introduce and benchmark SMAX, a JAX-based approximate reimplementation of the popular StarCraft Multi-Agent Challenge.
arXiv Detail & Related papers (2023-11-16T18:58:43Z) - Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under
Massively Parallel Simulation [17.827002299991285]
Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data.
Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU.
This paper presents a Parallel $Q$-Learning scheme that outperforms PPO in wall-clock time.
arXiv Detail & Related papers (2023-07-24T17:59:37Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Optimizing Data Collection in Deep Reinforcement Learning [4.9709347068704455]
GPU vectorization can achieve up to $1024times$ speedup over commonly used CPU simulators.
We show that simulator kernel fusion speedups with a simple simulator are $11.3times$ and increase by up to $1024times$ as simulator complexity increases in terms of memory bandwidth requirements.
arXiv Detail & Related papers (2022-07-15T20:22:31Z) - EnvPool: A Highly Parallel Reinforcement Learning Environment Execution
Engine [69.47822647770542]
parallel environment execution is often the slowest part of the whole system but receives little attention.
With a curated design for paralleling RL environments, we have improved the RL environment simulation speed across different hardware setups.
On a high-end machine, EnvPool achieves 1 million frames per second for the environment execution on Atari environments and 3 million frames per second on MuJoCo environments.
arXiv Detail & Related papers (2022-06-21T17:36:15Z) - ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep
Reinforcement Learning [141.58588761593955]
We present a library ElegantRL-podracer for cloud-native deep reinforcement learning.
It efficiently supports millions of cores to carry out massively parallel training at multiple levels.
At a low-level, each pod simulates agent-environment interactions in parallel by fully utilizing nearly 7,000 GPU cores in a single GPU.
arXiv Detail & Related papers (2021-12-11T06:31:21Z) - Accelerating GAN training using highly parallel hardware on public cloud [0.3694429692322631]
This work explores different types of cloud services to train a Geneversarative Adversarial Network (GAN) in a parallel environment.
We parallelize the training process on multiple GPU and Google Processing Units (TPU)
Linear speed-up of the training process is obtained, while retaining most of the performance in terms of physics results.
arXiv Detail & Related papers (2021-11-08T16:59:15Z) - Large Batch Simulation for Deep Reinforcement Learning [101.01408262583378]
We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work.
We realize end-to-end training speeds of over 19,000 frames of experience per second on a single and up to 72,000 frames per second on a single eight- GPU machine.
By combining batch simulation and performance optimizations, we demonstrate that Point navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system.
arXiv Detail & Related papers (2021-03-12T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.