Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling
- URL: http://arxiv.org/abs/2006.07178v2
- Date: Mon, 15 Jun 2020 18:34:23 GMT
- Title: Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling
- Authors: Russell Mendonca, Xinyang Geng, Chelsea Finn, Sergey Levine
- Abstract summary: We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
- Score: 126.69933134648541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning algorithms can acquire policies for complex tasks
autonomously. However, the number of samples required to learn a diverse set of
skills can be prohibitively large. While meta-reinforcement learning methods
have enabled agents to leverage prior experience to adapt quickly to new tasks,
their performance depends crucially on how close the new task is to the
previously experienced tasks. Current approaches are either not able to
extrapolate well, or can do so at the expense of requiring extremely large
amounts of data for on-policy meta-training. In this work, we present model
identification and experience relabeling (MIER), a meta-reinforcement learning
algorithm that is both efficient and extrapolates well when faced with
out-of-distribution tasks at test time. Our method is based on a simple
insight: we recognize that dynamics models can be adapted efficiently and
consistently with off-policy data, more easily than policies and value
functions. These dynamics models can then be used to continue training policies
and value functions for out-of-distribution tasks without using
meta-reinforcement learning at all, by generating synthetic experience for the
new task.
Related papers
- ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Learning Adaptable Policy via Meta-Adversarial Inverse Reinforcement
Learning for Decision-making Tasks [2.1485350418225244]
We build an adaptable imitation learning model based on the integration of Meta-learning and Adversarial Inverse Reinforcement Learning.
We exploit the adversarial learning and inverse reinforcement learning mechanisms to learn policies and reward functions simultaneously from available training tasks.
arXiv Detail & Related papers (2021-03-23T17:16:38Z) - Double Meta-Learning for Data Efficient Policy Optimization in
Non-Stationary Environments [12.45281856559346]
We are interested in learning models of non-stationary environments, which can be framed as a multi-task learning problem.
Model-free reinforcement learning algorithms can achieve good performance in multi-task learning at a cost of extensive sampling.
While model-based approaches are among the most data efficient learning algorithms, they still struggle with complex tasks and model uncertainties.
arXiv Detail & Related papers (2020-11-21T03:19:35Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Probabilistic Active Meta-Learning [15.432006404678981]
We introduce task selection based on prior experience into a meta-learning algorithm.
We provide empirical evidence that our approach improves data-efficiency when compared to strong baselines on simulated robotic experiments.
arXiv Detail & Related papers (2020-07-17T12:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.