Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
- URL: http://arxiv.org/abs/2209.12016v2
- Date: Thu, 25 May 2023 00:50:57 GMT
- Title: Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
- Authors: Sai Rajeswar, Pietro Mazzaglia, Tim Verbelen, Alexandre Pich\'e, Bart
Dhoedt, Aaron Courville, Alexandre Lacoste
- Abstract summary: Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
- Score: 112.63440666617494
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Controlling artificial agents from visual sensory data is an arduous task.
Reinforcement learning (RL) algorithms can succeed but require large amounts of
interactions between the agent and the environment. To alleviate the issue,
unsupervised RL proposes to employ self-supervised interaction and learning,
for adapting faster to future tasks. Yet, as shown in the Unsupervised RL
Benchmark (URLB; Laskin et al. 2021), whether current unsupervised strategies
can improve generalization capabilities is still unclear, especially in visual
control settings. In this work, we study the URLB and propose a new method to
solve it, using unsupervised model-based RL, for pre-training the agent, and a
task-aware fine-tuning strategy combined with a new proposed hybrid planner,
Dyna-MPC, to adapt the agent for downstream tasks. On URLB, our method obtains
93.59% overall normalized performance, surpassing previous baselines by a
staggering margin. The approach is empirically evaluated through a large-scale
empirical study, which we use to validate our design choices and analyze our
models. We also show robust performance on the Real-Word RL benchmark, hinting
at resiliency to environment perturbations during adaptation. Project website:
https://masteringurlb.github.io/
Related papers
- Accelerating Goal-Conditioned RL Algorithms and Research [17.155006770675904]
Self-supervised goal-conditioned reinforcement learning (GCRL) agents discover new behaviors by learning from the goals achieved during unstructured interaction with the environment.
These methods have failed to see similar success due to a lack of data from slow environment simulations as well as a lack of stable algorithms.
We release a benchmark (JaxGCRL) for self-supervised GCRL, enabling researchers to train agents for millions of environment steps in minutes on a single GPU.
arXiv Detail & Related papers (2024-08-20T17:58:40Z) - World Models Increase Autonomy in Reinforcement Learning [6.151562278670799]
Reinforcement learning (RL) is an appealing paradigm for training intelligent agents.
MoReFree agent adapts two key mechanisms, exploration and policy learning, to handle reset-free tasks.
It exhibits superior data-efficiency across various reset-free tasks without access to environmental reward or demonstrations.
arXiv Detail & Related papers (2024-08-19T08:56:00Z) - Knowledge Graph Reasoning with Self-supervised Reinforcement Learning [30.359557545737747]
We propose a self-supervised pre-training method to warm up the policy network before the RL training stage.
In our supervised learning stage, the agent selects actions based on the policy network and learns from generated labels.
We show that our SSRL model meets or exceeds current state-of-the-art results on all Hits@k and mean reciprocal rank (MRR) metrics.
arXiv Detail & Related papers (2024-05-22T13:39:33Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Light-weight probing of unsupervised representations for Reinforcement Learning [20.638410483549706]
We study whether linear probing can be a proxy evaluation task for the quality of unsupervised RL representation.
We show that the probing tasks are strongly rank correlated with the downstream RL performance on the Atari100k Benchmark.
This provides a more efficient method for exploring the space of pretraining algorithms and identifying promising pretraining recipes.
arXiv Detail & Related papers (2022-08-25T21:08:01Z) - URLB: Unsupervised Reinforcement Learning Benchmark [82.36060735454647]
We introduce the Unsupervised Reinforcement Learning Benchmark (URLB)
URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards.
We provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods.
arXiv Detail & Related papers (2021-10-28T15:07:01Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Efficiently Training On-Policy Actor-Critic Networks in Robotic Deep
Reinforcement Learning with Demonstration-like Sampled Exploration [7.930709072852582]
We propose a generic framework for Learning from Demonstration (LfD) based on actor-critic algorithms.
We conduct experiments on 4 standard benchmark environments in Mujoco and 2 self-designed robotic environments.
arXiv Detail & Related papers (2021-09-27T12:42:05Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.