Offline Meta Reinforcement Learning with In-Distribution Online
Adaptation
- URL: http://arxiv.org/abs/2305.19529v2
- Date: Thu, 1 Jun 2023 17:31:25 GMT
- Title: Offline Meta Reinforcement Learning with In-Distribution Online
Adaptation
- Authors: Jianhao Wang, Jin Zhang, Haozhe Jiang, Junyu Zhang, Liwei Wang,
Chongjie Zhang
- Abstract summary: We first characterize a unique challenge in offline meta-RL: transition-reward distribution shift between offline datasets and online adaptation.
We propose a novel adaptation framework, called In-Distribution online Adaptation with uncertainty Quantification (IDAQ)
IDAQ generates in-distribution context using a given uncertainty and performs effective task belief inference to address new tasks.
- Score: 38.35415999829767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent offline meta-reinforcement learning (meta-RL) methods typically
utilize task-dependent behavior policies (e.g., training RL agents on each
individual task) to collect a multi-task dataset. However, these methods always
require extra information for fast adaptation, such as offline context for
testing tasks. To address this problem, we first formally characterize a unique
challenge in offline meta-RL: transition-reward distribution shift between
offline datasets and online adaptation. Our theory finds that
out-of-distribution adaptation episodes may lead to unreliable policy
evaluation and that online adaptation with in-distribution episodes can ensure
adaptation performance guarantee. Based on these theoretical insights, we
propose a novel adaptation framework, called In-Distribution online Adaptation
with uncertainty Quantification (IDAQ), which generates in-distribution context
using a given uncertainty quantification and performs effective task belief
inference to address new tasks. We find a return-based uncertainty
quantification for IDAQ that performs effectively. Experiments show that IDAQ
achieves state-of-the-art performance on the Meta-World ML1 benchmark compared
to baselines with/without offline adaptation.
Related papers
- Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning [8.251711947874238]
offline RL provides a promising solution by giving an offline policy, which can be refined through online interactions.
Existing approaches perform offline and online learning in the same task, without considering the task generalization problem in offline-to-online adaptation.
Our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning.
arXiv Detail & Related papers (2024-05-12T08:52:52Z) - Generalizable Task Representation Learning for Offline
Meta-Reinforcement Learning with Data Limitations [22.23114883485924]
We propose a novel algorithm called GENTLE for learning generalizable task representations in the face of data limitations.
GENTLE employs Task Auto-Encoder(TAE), which is an encoder-decoder architecture to extract the characteristics of the tasks.
To alleviate the effect of limited behavior diversity, we construct pseudo-transitions to align the data distribution used to train TAE with the data distribution encountered during testing.
arXiv Detail & Related papers (2023-12-26T07:02:12Z) - Meta-Reinforcement Learning Based on Self-Supervised Task Representation
Learning [23.45043290237396]
MoSS is a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning.
On MuJoCo and Meta-World benchmarks, MoSS outperforms prior in terms of performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization.
arXiv Detail & Related papers (2023-04-29T15:46:19Z) - Learning to Adapt to Online Streams with Distribution Shifts [22.155844301575883]
Test-time adaptation (TTA) is a technique used to reduce distribution gaps between the training and testing sets by leveraging unlabeled test data during inference.
In this work, we expand TTA to a more practical scenario, where the test data comes in the form of online streams that experience distribution shifts over time.
We propose a meta-learning approach that teaches the network to adapt to distribution-shifting online streams during meta-training. As a result, the trained model can perform continual adaptation to distribution shifts in testing, regardless of the batch size restriction.
arXiv Detail & Related papers (2023-03-02T23:36:10Z) - Algorithm Design for Online Meta-Learning with Task Boundary Detection [63.284263611646]
We propose a novel algorithm for task-agnostic online meta-learning in non-stationary environments.
We first propose two simple but effective detection mechanisms of task switches and distribution shift.
We show that a sublinear task-averaged regret can be achieved for our algorithm under mild conditions.
arXiv Detail & Related papers (2023-02-02T04:02:49Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Model-Based Offline Meta-Reinforcement Learning with Regularization [63.35040401948943]
offline Meta-RL is emerging as a promising approach to address these challenges.
MerPO learns a meta-model for efficient task structure inference and an informative meta-policy.
We show that MerPO offers guaranteed improvement over both the behavior policy and the meta-policy.
arXiv Detail & Related papers (2022-02-07T04:15:20Z) - Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy.
Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data.
We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.