Context Shift Reduction for Offline Meta-Reinforcement Learning
- URL: http://arxiv.org/abs/2311.03695v1
- Date: Tue, 7 Nov 2023 03:50:01 GMT
- Title: Context Shift Reduction for Offline Meta-Reinforcement Learning
- Authors: Yunkai Gao, Rui Zhang, Jiaming Guo, Fan Wu, Qi Yi, Shaohui Peng,
Siming Lan, Ruizhi Chen, Zidong Du, Xing Hu, Qi Guo, Ling Li, Yunji Chen
- Abstract summary: The context shift problem arises due to the distribution discrepancy between the contexts used for training and testing.
Existing OMRL methods either overlook this problem or attempt to mitigate it with additional information.
We propose a novel approach called Context Shift Reduction for OMRL (CSRO) to address the context shift problem with only offline datasets.
- Score: 28.616141112916374
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline
datasets to enhance the agent's generalization ability on unseen tasks.
However, the context shift problem arises due to the distribution discrepancy
between the contexts used for training (from the behavior policy) and testing
(from the exploration policy). The context shift problem leads to incorrect
task inference and further deteriorates the generalization ability of the
meta-policy. Existing OMRL methods either overlook this problem or attempt to
mitigate it with additional information. In this paper, we propose a novel
approach called Context Shift Reduction for OMRL (CSRO) to address the context
shift problem with only offline datasets. The key insight of CSRO is to
minimize the influence of policy in context during both the meta-training and
meta-test phases. During meta-training, we design a max-min mutual information
representation learning mechanism to diminish the impact of the behavior policy
on task representation. In the meta-test phase, we introduce the non-prior
context collection strategy to reduce the effect of the exploration policy.
Experimental results demonstrate that CSRO significantly reduces the context
shift and improves the generalization ability, surpassing previous methods
across various challenging domains.
Related papers
- Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline
Reinforcement Learning [57.83919813698673]
Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error.
In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
arXiv Detail & Related papers (2023-11-25T00:30:58Z) - On Context Distribution Shift in Task Representation Learning for
Offline Meta RL [7.8317653074640186]
We focus on context-based OMRL, specifically on the challenge of learning task representation for OMRL.
To overcome this problem, we present a hard-sampling-based strategy to train a robust task context encoder.
arXiv Detail & Related papers (2023-04-01T16:21:55Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - Robust Task Representations for Offline Meta-Reinforcement Learning via
Contrastive Learning [21.59254848913971]
offline meta-reinforcement learning is a reinforcement learning paradigm that learns from offline data to adapt to new tasks.
We propose a contrastive learning framework for task representations that are robust to the distribution of behavior policies in training and test.
Experiments on a variety of offline meta-reinforcement learning benchmarks demonstrate the advantages of our method over prior methods.
arXiv Detail & Related papers (2022-06-21T14:46:47Z) - Model-Based Offline Meta-Reinforcement Learning with Regularization [63.35040401948943]
offline Meta-RL is emerging as a promising approach to address these challenges.
MerPO learns a meta-model for efficient task structure inference and an informative meta-policy.
We show that MerPO offers guaranteed improvement over both the behavior policy and the meta-policy.
arXiv Detail & Related papers (2022-02-07T04:15:20Z) - Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment.
We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return.
On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z) - Transfer Meta-Learning: Information-Theoretic Bounds and Information
Meta-Risk Minimization [47.7605527786164]
Meta-learning automatically infers an inductive bias by observing data from a number of related tasks.
We introduce the problem of transfer meta-learning, in which tasks are drawn from a target task environment during meta-testing.
arXiv Detail & Related papers (2020-11-04T12:55:43Z) - MetaCURE: Meta Reinforcement Learning with Empowerment-Driven
Exploration [52.48362697163477]
Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on sparse-reward tasks.
We model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning.
We develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies.
arXiv Detail & Related papers (2020-06-15T06:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.