Robust Task Representations for Offline Meta-Reinforcement Learning via
Contrastive Learning
- URL: http://arxiv.org/abs/2206.10442v1
- Date: Tue, 21 Jun 2022 14:46:47 GMT
- Title: Robust Task Representations for Offline Meta-Reinforcement Learning via
Contrastive Learning
- Authors: Haoqi Yuan, Zongqing Lu
- Abstract summary: offline meta-reinforcement learning is a reinforcement learning paradigm that learns from offline data to adapt to new tasks.
We propose a contrastive learning framework for task representations that are robust to the distribution of behavior policies in training and test.
Experiments on a variety of offline meta-reinforcement learning benchmarks demonstrate the advantages of our method over prior methods.
- Score: 21.59254848913971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study offline meta-reinforcement learning, a practical reinforcement
learning paradigm that learns from offline data to adapt to new tasks. The
distribution of offline data is determined jointly by the behavior policy and
the task. Existing offline meta-reinforcement learning algorithms cannot
distinguish these factors, making task representations unstable to the change
of behavior policies. To address this problem, we propose a contrastive
learning framework for task representations that are robust to the distribution
mismatch of behavior policies in training and test. We design a bi-level
encoder structure, use mutual information maximization to formalize task
representation learning, derive a contrastive learning objective, and introduce
several approaches to approximate the true distribution of negative pairs.
Experiments on a variety of offline meta-reinforcement learning benchmarks
demonstrate the advantages of our method over prior methods, especially on the
generalization to out-of-distribution behavior policies. The code is available
at https://github.com/PKU-AI-Edge/CORRO.
Related papers
- Rethinking Meta-Learning from a Learning Lens [17.00587250127854]
We focus on the more fundamental learning to learn'' strategy of meta-learning to explore what causes errors and how to eliminate these errors without changing the environment.
We propose using task relations to the optimization process of meta-learning and propose a plug-and-play method called Task Relation Learner (TRLearner) to achieve this goal.
arXiv Detail & Related papers (2024-09-13T02:00:16Z) - Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - Disentangling Policy from Offline Task Representation Learning via
Adversarial Data Augmentation [29.49883684368039]
offline meta-reinforcement learning (OMRL) proficiently allows an agent to tackle novel tasks while relying on a static dataset.
We introduce a novel algorithm to disentangle the impact of behavior policy from task representation learning.
arXiv Detail & Related papers (2024-03-12T02:38:36Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - On Context Distribution Shift in Task Representation Learning for
Offline Meta RL [7.8317653074640186]
We focus on context-based OMRL, specifically on the challenge of learning task representation for OMRL.
To overcome this problem, we present a hard-sampling-based strategy to train a robust task context encoder.
arXiv Detail & Related papers (2023-04-01T16:21:55Z) - Algorithm Design for Online Meta-Learning with Task Boundary Detection [63.284263611646]
We propose a novel algorithm for task-agnostic online meta-learning in non-stationary environments.
We first propose two simple but effective detection mechanisms of task switches and distribution shift.
We show that a sublinear task-averaged regret can be achieved for our algorithm under mild conditions.
arXiv Detail & Related papers (2023-02-02T04:02:49Z) - Conditional Meta-Learning of Linear Representations [57.90025697492041]
Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks.
In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information into a representation tailored to the task at hand.
We propose a meta-algorithm capable of leveraging this advantage in practice.
arXiv Detail & Related papers (2021-03-30T12:02:14Z) - Lessons from Chasing Few-Shot Learning Benchmarks: Rethinking the
Evaluation of Meta-Learning Methods [9.821362920940631]
We introduce a simple baseline for meta-learning, FIX-ML.
We explore two possible goals of meta-learning: to develop methods that generalize (i) to the same task distribution that generates the training set (in-distribution), or (ii) to new, unseen task distributions (out-of-distribution)
Our results highlight that in order to reason about progress in this space, it is necessary to provide a clearer description of the goals of meta-learning, and to develop more appropriate evaluation strategies.
arXiv Detail & Related papers (2021-02-23T05:34:30Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.