Pgx: Hardware-Accelerated Parallel Game Simulators for Reinforcement
Learning
- URL: http://arxiv.org/abs/2303.17503v4
- Date: Mon, 15 Jan 2024 13:12:36 GMT
- Title: Pgx: Hardware-Accelerated Parallel Game Simulators for Reinforcement
Learning
- Authors: Sotetsu Koyamada, Shinri Okano, Soichiro Nishimori, Yu Murata, Keigo
Habara, Haruka Kita, Shin Ishii
- Abstract summary: Pgx is a suite of board game reinforcement learning (RL) environments written in JAX and optimized for GPU/TPU accelerators.
Pgx can simulate RL environments 10-100x faster than existing implementations available in Python.
Pgx includes RL environments commonly used as benchmarks in RL research, such as backgammon, chess, shogi, and Go.
- Score: 0.6670498055582528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose Pgx, a suite of board game reinforcement learning (RL)
environments written in JAX and optimized for GPU/TPU accelerators. By
leveraging JAX's auto-vectorization and parallelization over accelerators, Pgx
can efficiently scale to thousands of simultaneous simulations over
accelerators. In our experiments on a DGX-A100 workstation, we discovered that
Pgx can simulate RL environments 10-100x faster than existing implementations
available in Python. Pgx includes RL environments commonly used as benchmarks
in RL research, such as backgammon, chess, shogi, and Go. Additionally, Pgx
offers miniature game sets and baseline models to facilitate rapid research
cycles. We demonstrate the efficient training of the Gumbel AlphaZero algorithm
with Pgx environments. Overall, Pgx provides high-performance environment
simulators for researchers to accelerate their RL experiments. Pgx is available
at http://github.com/sotetsuk/pgx.
Related papers
- Octax: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX [0.0]
Reinforcement learning (RL) research requires diverse, challenging environments that are both tractable and scalable.<n>We introduce Octax, a high-performance suite of classic arcade game environments implemented in JAX.
arXiv Detail & Related papers (2025-10-02T07:56:47Z) - Ludax: A GPU-Accelerated Domain Specific Language for Board Games [44.45953630612019]
Ludax is a domain-specific language for board games which automatically compiles into hardware-accelerated code.<n>We envision Ludax as a tool to help accelerate games research generally, from RL to cognitive science.
arXiv Detail & Related papers (2025-06-27T20:15:53Z) - NAVIX: Scaling MiniGrid Environments with JAX [17.944645332888335]
We introduce NAVIX, a re-implementation of MiniGrid in JAX.
NAVIX achieves over 200 000x speed improvements in batch mode, supporting up to 2048 agents in parallel on a single Nvidia A100 80 GB.
This reduces experiment times from one week to 15 minutes, promoting faster design and more scalable RL model development.
arXiv Detail & Related papers (2024-07-28T04:39:18Z) - JaxMARL: Multi-Agent RL Environments and Algorithms in JAX [105.343918678781]
We present JaxMARL, the first open-source, Python-based library that combines GPU-enabled efficiency with support for a large number of commonly used MARL environments.
Our experiments show that, in terms of wall clock time, our JAX-based training pipeline is around 14 times faster than existing approaches.
We also introduce and benchmark SMAX, a JAX-based approximate reimplementation of the popular StarCraft Multi-Agent Challenge.
arXiv Detail & Related papers (2023-11-16T18:58:43Z) - RL-X: A Deep Reinforcement Learning Library (not only) for RoboCup [0.0]
RL-X provides a flexible and easy-to-extend with self-contained single directory algorithms.
RL-X can reach up to 4.5x speedups compared to well-known frameworks like Stable-Baselines3.
arXiv Detail & Related papers (2023-10-20T10:06:03Z) - Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - EnvPool: A Highly Parallel Reinforcement Learning Environment Execution
Engine [69.47822647770542]
parallel environment execution is often the slowest part of the whole system but receives little attention.
With a curated design for paralleling RL environments, we have improved the RL environment simulation speed across different hardware setups.
On a high-end machine, EnvPool achieves 1 million frames per second for the environment execution on Atari environments and 3 million frames per second on MuJoCo environments.
arXiv Detail & Related papers (2022-06-21T17:36:15Z) - ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep
Reinforcement Learning [141.58588761593955]
We present a library ElegantRL-podracer for cloud-native deep reinforcement learning.
It efficiently supports millions of cores to carry out massively parallel training at multiple levels.
At a low-level, each pod simulates agent-environment interactions in parallel by fully utilizing nearly 7,000 GPU cores in a single GPU.
arXiv Detail & Related papers (2021-12-11T06:31:21Z) - Mastering Atari Games with Limited Data [73.6189496825209]
We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero.
Our method achieves 190.4% mean human performance on the Atari 100k benchmark with only two hours of real-time game experience.
This is the first time an algorithm achieves super-human performance on Atari games with such little data.
arXiv Detail & Related papers (2021-10-30T09:13:39Z) - WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement
Learning on a GPU [15.337470862838794]
We present WarpDrive, a flexible, lightweight, and easy-to-use open-source RL framework that implements end-to-end multi-agent RL on a single GPU.
Our design runs simulations and the agents in each simulation in parallel. It also uses a single simulation data store on the GPU that is safely updated in-place.
WarpDrive yields 2.9 million environment steps/second with 2000 environments and 1000 agents (at least 100x higher throughput compared to a CPU implementation) in a benchmark Tag simulation.
arXiv Detail & Related papers (2021-08-31T16:59:27Z) - Megaverse: Simulating Embodied Agents at One Million Experiences per
Second [75.1191260838366]
We present Megaverse, a new 3D simulation platform for reinforcement learning and embodied AI research.
Megaverse is up to 70x faster than DeepMind Lab in fully-shaded 3D scenes with interactive objects.
We use Megaverse to build a new benchmark that consists of several single-agent and multi-agent tasks.
arXiv Detail & Related papers (2021-07-17T03:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.