minimax: Efficient Baselines for Autocurricula in JAX
- URL: http://arxiv.org/abs/2311.12716v2
- Date: Thu, 23 Nov 2023 19:12:07 GMT
- Title: minimax: Efficient Baselines for Autocurricula in JAX
- Authors: Minqi Jiang, Michael Dennis, Edward Grefenstette, Tim Rockt\"aschel
- Abstract summary: This work introduces the minimax library for UED training on accelerated hardware.
Using JAX to implement fully-tensorized environments and autocurriculum algorithms, minimax allows the entire training loop to be compiled for hardware acceleration.
- Score: 30.664874531580594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised environment design (UED) is a form of automatic curriculum
learning for training robust decision-making agents to zero-shot transfer into
unseen environments. Such autocurricula have received much interest from the RL
community. However, UED experiments, based on CPU rollouts and GPU model
updates, have often required several weeks of training. This compute
requirement is a major obstacle to rapid innovation for the field. This work
introduces the minimax library for UED training on accelerated hardware. Using
JAX to implement fully-tensorized environments and autocurriculum algorithms,
minimax allows the entire training loop to be compiled for hardware
acceleration. To provide a petri dish for rapid experimentation, minimax
includes a tensorized grid-world based on MiniGrid, in addition to reusable
abstractions for conducting autocurricula in procedurally-generated
environments. With these components, minimax provides strong UED baselines,
including new parallelized variants, which achieve over 120$\times$ speedups in
wall time compared to previous implementations when training with equal batch
sizes. The minimax library is available under the Apache 2.0 license at
https://github.com/facebookresearch/minimax.
Related papers
- JaxMARL: Multi-Agent RL Environments in JAX [107.7560737385902]
We present JaxMARL, the first open-source code base that combines ease-of-use with GPU enabled efficiency.
Our experiments show that per-run our JAX-based training pipeline is up to 12500x faster than existing approaches.
We also introduce and benchmark SMAX, a vectorised, simplified version of the popular StarCraft Multi-Agent Challenge.
arXiv Detail & Related papers (2023-11-16T18:58:43Z) - Rethinking Closed-loop Training for Autonomous Driving [82.61418945804544]
We present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents.
We propose trajectory value learning (TRAVL), an RL-based driving agent that performs planning with multistep look-ahead.
Our experiments show that TRAVL can learn much faster and produce safer maneuvers compared to all the baselines.
arXiv Detail & Related papers (2023-06-27T17:58:39Z) - Cramming: Training a Language Model on a Single GPU in One Day [64.18297923419627]
Recent trends in language modeling have focused on increasing performance through scaling.
We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU.
We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings.
arXiv Detail & Related papers (2022-12-28T18:59:28Z) - Parallel Reinforcement Learning Simulation for Visual Quadrotor
Navigation [4.597465975849579]
Reinforcement learning (RL) is an agent-based approach for teaching robots to navigate within the physical world.
We present a simulation framework, built on AirSim, which provides efficient parallel training.
Building on this framework, Ape-X is modified to incorporate decentralised training of AirSim environments.
arXiv Detail & Related papers (2022-09-22T15:27:42Z) - ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep
Reinforcement Learning [141.58588761593955]
We present a library ElegantRL-podracer for cloud-native deep reinforcement learning.
It efficiently supports millions of cores to carry out massively parallel training at multiple levels.
At a low-level, each pod simulates agent-environment interactions in parallel by fully utilizing nearly 7,000 GPU cores in a single GPU.
arXiv Detail & Related papers (2021-12-11T06:31:21Z) - Braxlines: Fast and Interactive Toolkit for RL-driven Behavior
Engineering beyond Reward Maximization [15.215372246434413]
In reinforcement learning (RL)-driven approaches, the goal of continuous control is to synthesize desired behaviors.
In this paper, we introduce braxlines, a toolkit for fast and interactive-driven behavior generation beyond simple reward RL.
Our implementations build on a hardware-accelerated Brax simulator in Jax with minimal modifications, enabling behavior within minutes of training.
arXiv Detail & Related papers (2021-10-10T02:41:01Z) - Brax -- A Differentiable Physics Engine for Large Scale Rigid Body
Simulation [33.36244621210259]
We present Brax, an open source library for rigid body simulation written in JAX.
We present results on a suite of tasks inspired by the existing reinforcement learning literature, but remade in our engine.
arXiv Detail & Related papers (2021-06-24T19:09:12Z) - Large Batch Simulation for Deep Reinforcement Learning [101.01408262583378]
We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work.
We realize end-to-end training speeds of over 19,000 frames of experience per second on a single and up to 72,000 frames per second on a single eight- GPU machine.
By combining batch simulation and performance optimizations, we demonstrate that Point navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system.
arXiv Detail & Related papers (2021-03-12T00:22:50Z) - Multi-node Bert-pretraining: Cost-efficient Approach [6.5998084177955425]
Large scale Transformer-based language models have brought about exciting leaps in state-of-the-art results for many Natural Language Processing (NLP) tasks.
With the advent of large-scale unsupervised datasets, training time is further extended due to the increased amount of data samples within a single training epoch.
We show that we are able to perform pre-training on BERT within a reasonable time budget (12 days) in an academic setting.
arXiv Detail & Related papers (2020-08-01T05:49:20Z) - An Active Learning Framework for Constructing High-fidelity Mobility
Maps [0.0]
We introduce an active learning paradigm that substantially reduces the number of simulations needed to train a machine learning classifier without sacrificing accuracy.
Experimental results suggest that our sampling algorithm can train a neural network, with higher accuracy, using less than half the number of simulations when compared to random sampling.
arXiv Detail & Related papers (2020-03-07T04:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.