EnvPool: A Highly Parallel Reinforcement Learning Environment Execution
Engine
- URL: http://arxiv.org/abs/2206.10558v1
- Date: Tue, 21 Jun 2022 17:36:15 GMT
- Title: EnvPool: A Highly Parallel Reinforcement Learning Environment Execution
Engine
- Authors: Jiayi Weng, Min Lin, Shengyi Huang, Bo Liu, Denys Makoviichuk, Viktor
Makoviychuk, Zichen Liu, Yufan Song, Ting Luo, Yukun Jiang, Zhongwen Xu,
Shuicheng Yan
- Abstract summary: parallel environment execution is often the slowest part of the whole system but receives little attention.
With a curated design for paralleling RL environments, we have improved the RL environment simulation speed across different hardware setups.
On a high-end machine, EnvPool achieves 1 million frames per second for the environment execution on Atari environments and 3 million frames per second on MuJoCo environments.
- Score: 69.47822647770542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been significant progress in developing reinforcement learning (RL)
training systems. Past works such as IMPALA, Apex, Seed RL, Sample Factory, and
others aim to improve the system's overall throughput. In this paper, we try to
address a common bottleneck in the RL training system, i.e., parallel
environment execution, which is often the slowest part of the whole system but
receives little attention. With a curated design for paralleling RL
environments, we have improved the RL environment simulation speed across
different hardware setups, ranging from a laptop, and a modest workstation, to
a high-end machine like NVIDIA DGX-A100. On a high-end machine, EnvPool
achieves 1 million frames per second for the environment execution on Atari
environments and 3 million frames per second on MuJoCo environments. When
running on a laptop, the speed of EnvPool is 2.8 times of the Python
subprocess. Moreover, great compatibility with existing RL training libraries
has been demonstrated in the open-sourced community, including CleanRL,
rl_games, DeepMind Acme, etc. Finally, EnvPool allows researchers to iterate
their ideas at a much faster pace and has the great potential to become the de
facto RL environment execution engine. Example runs show that it takes only 5
minutes to train Atari Pong and MuJoCo Ant, both on a laptop. EnvPool has
already been open-sourced at https://github.com/sail-sg/envpool.
Related papers
- NAVIX: Scaling MiniGrid Environments with JAX [17.944645332888335]
We introduce NAVIX, a re-implementation of MiniGrid in JAX.
NAVIX achieves over 200 000x speed improvements in batch mode, supporting up to 2048 agents in parallel on a single Nvidia A100 80 GB.
This reduces experiment times from one week to 15 minutes, promoting faster design and more scalable RL model development.
arXiv Detail & Related papers (2024-07-28T04:39:18Z) - A Benchmark Environment for Offline Reinforcement Learning in Racing Games [54.83171948184851]
Offline Reinforcement Learning (ORL) is a promising approach to reduce the high sample complexity of traditional Reinforcement Learning (RL)
This paper introduces OfflineMania, a novel environment for ORL research.
It is inspired by the iconic TrackMania series and developed using the Unity 3D game engine.
arXiv Detail & Related papers (2024-07-12T16:44:03Z) - JaxMARL: Multi-Agent RL Environments and Algorithms in JAX [105.343918678781]
We present JaxMARL, the first open-source, Python-based library that combines GPU-enabled efficiency with support for a large number of commonly used MARL environments.
Our experiments show that, in terms of wall clock time, our JAX-based training pipeline is around 14 times faster than existing approaches.
We also introduce and benchmark SMAX, a JAX-based approximate reimplementation of the popular StarCraft Multi-Agent Challenge.
arXiv Detail & Related papers (2023-11-16T18:58:43Z) - SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores [13.948640763797776]
We present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework.
We develop a scalable, efficient, and distributed RL system called ReaLly scalableRL, which allows efficient and massively parallelized training.
SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores.
arXiv Detail & Related papers (2023-06-29T05:16:25Z) - ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep
Reinforcement Learning [141.58588761593955]
We present a library ElegantRL-podracer for cloud-native deep reinforcement learning.
It efficiently supports millions of cores to carry out massively parallel training at multiple levels.
At a low-level, each pod simulates agent-environment interactions in parallel by fully utilizing nearly 7,000 GPU cores in a single GPU.
arXiv Detail & Related papers (2021-12-11T06:31:21Z) - WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement
Learning on a GPU [15.337470862838794]
We present WarpDrive, a flexible, lightweight, and easy-to-use open-source RL framework that implements end-to-end multi-agent RL on a single GPU.
Our design runs simulations and the agents in each simulation in parallel. It also uses a single simulation data store on the GPU that is safely updated in-place.
WarpDrive yields 2.9 million environment steps/second with 2000 environments and 1000 agents (at least 100x higher throughput compared to a CPU implementation) in a benchmark Tag simulation.
arXiv Detail & Related papers (2021-08-31T16:59:27Z) - The NetHack Learning Environment [79.06395964379107]
We present the NetHack Learning Environment (NLE), a procedurally generated rogue-like environment for Reinforcement Learning research.
We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL.
We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration.
arXiv Detail & Related papers (2020-06-24T14:12:56Z) - Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with
Asynchronous Reinforcement Learning [68.2099740607854]
"Sample Factory" is a high- throughput training system optimized for a single-machine setting.
Our architecture combines a highly efficient, asynchronous, GPU-based sampler with off-policy correction techniques.
We extend Sample Factory to support self-play and population-based training and apply these techniques to train highly capable agents for a multiplayer first-person shooter game.
arXiv Detail & Related papers (2020-06-21T10:00:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.