Related papers: Offline Meta-Reinforcement Learning with Advantage Weighting

Offline Meta-Reinforcement Learning with Advantage Weighting

URL: http://arxiv.org/abs/2008.06043v3
Date: Wed, 21 Jul 2021 17:33:01 GMT
Title: Offline Meta-Reinforcement Learning with Advantage Weighting
Authors: Eric Mitchell, Rafael Rafailov, Xue Bin Peng, Sergey Levine, Chelsea Finn
Abstract summary: This paper introduces the offline meta-reinforcement learning (offline meta-RL) problem setting and proposes an algorithm that performs well in this setting. offline meta-RL is analogous to the widely successful supervised learning strategy of pre-training a model on a large batch of fixed, pre-collected data. We propose Meta-Actor Critic with Advantage Weighting (MACAW), an optimization-based meta-learning algorithm that uses simple, supervised regression objectives for both the inner and outer loop of meta-training.
Score: 125.21298190780259
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces the offline meta-reinforcement learning (offline meta-RL) problem setting and proposes an algorithm that performs well in this setting. Offline meta-RL is analogous to the widely successful supervised learning strategy of pre-training a model on a large batch of fixed, pre-collected data (possibly from various tasks) and fine-tuning the model to a new task with relatively little data. That is, in offline meta-RL, we meta-train on fixed, pre-collected data from several tasks in order to adapt to a new task with a very small amount (less than 5 trajectories) of data from the new task. By nature of being offline, algorithms for offline meta-RL can utilize the largest possible pool of training data available and eliminate potentially unsafe or costly data collection during meta-training. This setting inherits the challenges of offline RL, but it differs significantly because offline RL does not generally consider a) transfer to new tasks or b) limited data from the test task, both of which we face in offline meta-RL. Targeting the offline meta-RL setting, we propose Meta-Actor Critic with Advantage Weighting (MACAW), an optimization-based meta-learning algorithm that uses simple, supervised regression objectives for both the inner and outer loop of meta-training. On offline variants of common meta-RL benchmarks, we empirically find that this approach enables fully offline meta-reinforcement learning and achieves notable gains over prior methods.

Related papers

Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data [8.583014846046886]
A major challenge in Reinforcement Learning (RL) is the difficulty of learning an optimal policy from sparse rewards. We develop Generalized Imitation Learning from Demonstration (GILD), which meta-learns an objective that distills knowledge from offline data. In four challenging MuJoCo tasks with sparse rewards, we show that three RL algorithms enhanced with GILD significantly outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-01-13T14:11:12Z)
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z)
A Tutorial on Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.<n>We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.<n>We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z)
Contextual Transformer for Offline Meta Reinforcement Learning [16.587320914107128]
We show how prompts can improve sequence modeling-based offline reinforcement learning ( offline RL) algorithms. We propose prompt tuning for offline RL, where a context vector sequence istextuald with the input to guide the conditional policy generation. We extend our framework to Meta-RL settings and propose Contextual Meta Transformer (CMT); CMT leverages the context among different tasks as the prompt to improve generalization on unseen tasks.
arXiv Detail & Related papers (2022-11-15T10:00:14Z)
Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms. Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z)
Model-Based Offline Meta-Reinforcement Learning with Regularization [63.35040401948943]
offline Meta-RL is emerging as a promising approach to address these challenges. MerPO learns a meta-model for efficient task structure inference and an informative meta-policy. We show that MerPO offers guaranteed improvement over both the behavior policy and the meta-policy.
arXiv Detail & Related papers (2022-02-07T04:15:20Z)
Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL. ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL. We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z)
Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy. Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data. We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z)
Representation Matters: Offline Pretraining for Sequential Decision Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making. We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z)
FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks. This problem is still not fully understood, for which two major challenges need to be addressed. We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.