Decoupling Representation Learning from Reinforcement Learning
- URL: http://arxiv.org/abs/2009.08319v3
- Date: Sun, 16 May 2021 20:44:18 GMT
- Title: Decoupling Representation Learning from Reinforcement Learning
- Authors: Adam Stooke, Kimin Lee, Pieter Abbeel, and Michael Laskin
- Abstract summary: We introduce an unsupervised learning task called Augmented Temporal Contrast (ATC)
ATC trains a convolutional encoder to associate pairs of observations separated by a short time difference.
In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL.
- Score: 89.82834016009461
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In an effort to overcome limitations of reward-driven feature learning in
deep reinforcement learning (RL) from images, we propose decoupling
representation learning from policy learning. To this end, we introduce a new
unsupervised learning (UL) task, called Augmented Temporal Contrast (ATC),
which trains a convolutional encoder to associate pairs of observations
separated by a short time difference, under image augmentations and using a
contrastive loss. In online RL experiments, we show that training the encoder
exclusively using ATC matches or outperforms end-to-end RL in most
environments. Additionally, we benchmark several leading UL algorithms by
pre-training encoders on expert demonstrations and using them, with weights
frozen, in RL agents; we find that agents using ATC-trained encoders outperform
all others. We also train multi-task encoders on data from multiple
environments and show generalization to different downstream RL tasks. Finally,
we ablate components of ATC, and introduce a new data augmentation to enable
replay of (compressed) latent images from pre-trained encoders when RL requires
augmentation. Our experiments span visually diverse RL benchmarks in DeepMind
Control, DeepMind Lab, and Atari, and our complete code is available at
https://github.com/astooke/rlpyt/tree/master/rlpyt/ul.
Related papers
- An Examination of Offline-Trained Encoders in Vision-Based Deep Reinforcement Learning for Autonomous Driving [0.0]
Research investigates the challenges Deep Reinforcement Learning (DRL) faces in Partially Observable Markov Decision Processes (POMDP)
Our research adopts an offline-trained encoder to leverage large video datasets through self-supervised learning to learn generalizable representations.
We show that the features learned by watching BDD100K driving videos can be directly transferred to achieve lane following and collision avoidance in CARLA simulator.
arXiv Detail & Related papers (2024-09-02T14:16:23Z) - Hybrid Inverse Reinforcement Learning [34.793570631021005]
inverse reinforcement learning approach to imitation learning is a double-edged sword.
We propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration.
We derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees.
arXiv Detail & Related papers (2024-02-13T23:29:09Z) - RLTF: Reinforcement Learning from Unit Test Feedback [17.35361167578498]
Reinforcement Learning from Unit Test Feedback is a novel online RL framework with unit test feedback of multi-granularity for refining code LLMs.
Our approach generates data in real-time during training and simultaneously utilizes fine-grained feedback signals to guide the model towards producing higher-quality code.
arXiv Detail & Related papers (2023-07-10T05:18:18Z) - Synthetic Experience Replay [48.601879260071655]
We propose Synthetic Experience Replay ( SynthER), a diffusion-based approach to flexibly upsample an agent's collected experience.
We show that SynthER is an effective method for training RL agents across offline and online settings.
We believe that synthetic training data could open the door to realizing the full potential of deep learning for replay-based RL algorithms from limited data.
arXiv Detail & Related papers (2023-03-12T09:10:45Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z) - RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning
Workloads [4.575381867242508]
We propose RL-Scope, a cross-stack profiler that scopes low-level CPU/GPU resource usage to high-level algorithmic operations.
We demonstrate RL-Scope's utility through in-depth case studies.
arXiv Detail & Related papers (2021-02-08T15:42:48Z) - Reinforcement Learning with Augmented Data [97.42819506719191]
We present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms.
We show that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods.
arXiv Detail & Related papers (2020-04-30T17:35:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.