Podracer architectures for scalable Reinforcement Learning
- URL: http://arxiv.org/abs/2104.06272v1
- Date: Tue, 13 Apr 2021 15:05:35 GMT
- Title: Podracer architectures for scalable Reinforcement Learning
- Authors: Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan,
Thomas Keck, Fabio Viola and Hado van Hasselt
- Abstract summary: How to best train reinforcement learning (RL) agents at scale is still an active research area.
In this report we argue that TPUs are particularly well suited for training RL agents in a scalable, efficient and reproducible way.
- Score: 23.369001500657028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supporting state-of-the-art AI research requires balancing rapid prototyping,
ease of use, and quick iteration, with the ability to deploy experiments at a
scale traditionally associated with production systems.Deep learning frameworks
such as TensorFlow, PyTorch and JAX allow users to transparently make use of
accelerators, such as TPUs and GPUs, to offload the more computationally
intensive parts of training and inference in modern deep learning systems.
Popular training pipelines that use these frameworks for deep learning
typically focus on (un-)supervised learning. How to best train reinforcement
learning (RL) agents at scale is still an active research area. In this report
we argue that TPUs are particularly well suited for training RL agents in a
scalable, efficient and reproducible way. Specifically we describe two
architectures designed to make the best use of the resources available on a TPU
Pod (a special configuration in a Google data center that features multiple TPU
devices connected to each other by extremely low latency communication
channels).
Related papers
- Rethinking Closed-loop Training for Autonomous Driving [82.61418945804544]
We present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents.
We propose trajectory value learning (TRAVL), an RL-based driving agent that performs planning with multistep look-ahead.
Our experiments show that TRAVL can learn much faster and produce safer maneuvers compared to all the baselines.
arXiv Detail & Related papers (2023-06-27T17:58:39Z) - Training Spiking Neural Networks with Local Tandem Learning [96.32026780517097]
Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient than their predecessors.
In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL)
We demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity.
arXiv Detail & Related papers (2022-10-10T10:05:00Z) - Parallel Reinforcement Learning Simulation for Visual Quadrotor
Navigation [4.597465975849579]
Reinforcement learning (RL) is an agent-based approach for teaching robots to navigate within the physical world.
We present a simulation framework, built on AirSim, which provides efficient parallel training.
Building on this framework, Ape-X is modified to incorporate decentralised training of AirSim environments.
arXiv Detail & Related papers (2022-09-22T15:27:42Z) - Bayesian Generational Population-Based Training [35.70338636901159]
Population-Based Training (PBT) has led to impressive performance in several large scale settings.
We introduce two new innovations in PBT-style methods.
We show that these innovations lead to large performance gains.
arXiv Detail & Related papers (2022-07-19T16:57:38Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Improving Generalization of Deep Reinforcement Learning-based TSP
Solvers [19.29028564568974]
We propose a novel approach named MAGIC that includes a deep learning architecture and a DRL training method.
Our architecture, which integrates a multilayer perceptron, a graph neural network, and an attention model, defines a policy that sequentially generates a traveling salesman solution.
Our training method includes several innovations: (1) we interleave DRL policy updates with local search (using a new local search technique), (2) we use a novel simple baseline, and (3) we apply gradient learning.
arXiv Detail & Related papers (2021-10-06T15:16:19Z) - Reinforcement Learning for Control of Valves [0.0]
This paper is a study of reinforcement learning (RL) as an optimal-control strategy for control of nonlinear valves.
It is evaluated against the PID (proportional-integral-derivative) strategy, using a unified framework.
arXiv Detail & Related papers (2020-12-29T09:01:47Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z) - Benchmarking network fabrics for data distributed training of deep
neural networks [10.067102343753643]
Large computational requirements for training deep models have necessitated the development of new methods for faster training.
One such approach is the data parallel approach, where the training data is distributed across multiple compute nodes.
In this paper, we examine the effects of using different physical hardware interconnects and network-related software primitives for enabling data distributed deep learning.
arXiv Detail & Related papers (2020-08-18T17:38:30Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.