Mastering Visual Continuous Control: Improved Data-Augmented
Reinforcement Learning
- URL: http://arxiv.org/abs/2107.09645v1
- Date: Tue, 20 Jul 2021 17:29:13 GMT
- Title: Mastering Visual Continuous Control: Improved Data-Augmented
Reinforcement Learning
- Authors: Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto
- Abstract summary: We present DrQ-v2, a model-free reinforcement learning algorithm for visual continuous control.
DrQ-v2 builds on DrQ, an off-policy actor-critic approach that uses data augmentation to learn directly from pixels.
Notably, DrQ-v2 is able to solve complex humanoid locomotion tasks directly from pixel observations.
- Score: 114.35801511501639
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present DrQ-v2, a model-free reinforcement learning (RL) algorithm for
visual continuous control. DrQ-v2 builds on DrQ, an off-policy actor-critic
approach that uses data augmentation to learn directly from pixels. We
introduce several improvements that yield state-of-the-art results on the
DeepMind Control Suite. Notably, DrQ-v2 is able to solve complex humanoid
locomotion tasks directly from pixel observations, previously unattained by
model-free RL. DrQ-v2 is conceptually simple, easy to implement, and provides
significantly better computational footprint compared to prior work, with the
majority of tasks taking just 8 hours to train on a single GPU. Finally, we
publicly release DrQ-v2's implementation to provide RL practitioners with a
strong and computationally efficient baseline.
Related papers
- Pretrained Visual Representations in Reinforcement Learning [0.0]
This paper compares the performance of visual reinforcement learning algorithms that train a convolutional neural network (CNN) from scratch with those that utilize pre-trained visual representations (PVRs)
We evaluate the Dormant Ratio Minimization (DRM) algorithm, a state-of-the-art visual RL method, against three PVRs: ResNet18, DINOv2, and Visual Cortex (VC)
arXiv Detail & Related papers (2024-07-24T12:53:26Z) - Extreme Q-Learning: MaxEnt RL without Entropy [88.97516083146371]
Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains.
We introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT)
Using EVT, we derive our Extreme Q-Learning framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms.
arXiv Detail & Related papers (2023-01-05T23:14:38Z) - Simultaneous Double Q-learning with Conservative Advantage Learning for
Actor-Critic Methods [133.85604983925282]
We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL)
Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2022-05-08T09:17:16Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - BRECQ: Pushing the Limit of Post-Training Quantization by Block
Reconstruction [29.040991149922615]
We study the challenging task of neural network quantization without end-to-end retraining, called Post-training Quantization (PTQ)
We propose a novel PTQ framework, dubbed BRECQ, which pushes the limits of bitwidth in PTQ down to INT2 for the first time.
For the first time we prove that, without bells and whistles, PTQ can attain 4-bit ResNet and MobileNetV2 comparable with QAT and enjoy 240 times faster production of quantized models.
arXiv Detail & Related papers (2021-02-10T13:46:16Z) - Decoupling Representation Learning from Reinforcement Learning [89.82834016009461]
We introduce an unsupervised learning task called Augmented Temporal Contrast (ATC)
ATC trains a convolutional encoder to associate pairs of observations separated by a short time difference.
In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL.
arXiv Detail & Related papers (2020-09-14T19:11:13Z) - Reinforcement Learning with Augmented Data [97.42819506719191]
We present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms.
We show that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods.
arXiv Detail & Related papers (2020-04-30T17:35:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.