Meta-Reinforcement Learning Based on Self-Supervised Task Representation
Learning
- URL: http://arxiv.org/abs/2305.00286v1
- Date: Sat, 29 Apr 2023 15:46:19 GMT
- Title: Meta-Reinforcement Learning Based on Self-Supervised Task Representation
Learning
- Authors: Mingyang Wang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Hang Su,
Chenguang Yang, Kai Huang and Alois Knoll
- Abstract summary: MoSS is a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning.
On MuJoCo and Meta-World benchmarks, MoSS outperforms prior in terms of performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization.
- Score: 23.45043290237396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Meta-reinforcement learning enables artificial agents to learn from related
training tasks and adapt to new tasks efficiently with minimal interaction
data. However, most existing research is still limited to narrow task
distributions that are parametric and stationary, and does not consider
out-of-distribution tasks during the evaluation, thus, restricting its
application. In this paper, we propose MoSS, a context-based Meta-reinforcement
learning algorithm based on Self-Supervised task representation learning to
address this challenge. We extend meta-RL to broad non-parametric task
distributions which have never been explored before, and also achieve
state-of-the-art results in non-stationary and out-of-distribution tasks.
Specifically, MoSS consists of a task inference module and a policy module. We
utilize the Gaussian mixture model for task representation to imitate the
parametric and non-parametric task variations. Additionally, our online
adaptation strategy enables the agent to react at the first sight of a task
change, thus being applicable in non-stationary tasks. MoSS also exhibits
strong generalization robustness in out-of-distributions tasks which benefits
from the reliable and robust task representation. The policy is built on top of
an off-policy RL algorithm and the entire network is trained completely
off-policy to ensure high sample efficiency. On MuJoCo and Meta-World
benchmarks, MoSS outperforms prior works in terms of asymptotic performance,
sample efficiency (3-50x faster), adaptation efficiency, and generalization
robustness on broad and diverse task distributions.
Related papers
- Meta-RTL: Reinforcement-Based Meta-Transfer Learning for Low-Resource Commonsense Reasoning [61.8360232713375]
We propose a reinforcement-based multi-source meta-transfer learning framework (Meta-RTL) for low-resource commonsense reasoning.
We present a reinforcement-based approach to dynamically estimating source task weights that measure the contribution of the corresponding tasks to the target task in the meta-transfer learning.
Experimental results demonstrate that Meta-RTL substantially outperforms strong baselines and previous task selection strategies.
arXiv Detail & Related papers (2024-09-27T18:22:22Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Towards Task Sampler Learning for Meta-Learning [37.02030832662183]
Meta-learning aims to learn general knowledge with diverse training tasks conducted from limited data, and then transfer it to new tasks.
It is commonly believed that increasing task diversity will enhance the generalization ability of meta-learning models.
This paper challenges this view through empirical and theoretical analysis.
arXiv Detail & Related papers (2023-07-18T01:53:18Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - Learning to generate imaginary tasks for improving generalization in
meta-learning [12.635773307074022]
The success of meta-learning on existing benchmarks is predicated on the assumption that the distribution of meta-training tasks covers meta-testing tasks.
Recent solutions have pursued augmentation of meta-training tasks, while it is still an open question to generate both correct and sufficiently imaginary tasks.
In this paper, we seek an approach that up-samples meta-training tasks from the task representation via a task up-sampling network. Besides, the resulting approach named Adversarial Task Up-sampling (ATU) suffices to generate tasks that can maximally contribute to the latest meta-learner by maximizing an adversarial loss.
arXiv Detail & Related papers (2022-06-09T08:21:05Z) - Meta-Reinforcement Learning in Broad and Non-Parametric Environments [8.091658684517103]
We introduce TIGR, a Task-Inference-based meta-RL algorithm for tasks in non-parametric environments.
We decouple the policy training from the task-inference learning and efficiently train the inference mechanism on the basis of an unsupervised reconstruction objective.
We provide a benchmark with qualitatively distinct tasks based on the half-cheetah environment and demonstrate the superior performance of TIGR compared to state-of-the-art meta-RL approaches.
arXiv Detail & Related papers (2021-08-08T19:32:44Z) - Meta-Learning with Fewer Tasks through Task Interpolation [67.03769747726666]
Current meta-learning algorithms require a large number of meta-training tasks, which may not be accessible in real-world scenarios.
By meta-learning with task gradient (MLTI), our approach effectively generates additional tasks by randomly sampling a pair of tasks and interpolating the corresponding features and labels.
Empirically, in our experiments on eight datasets from diverse domains, we find that the proposed general MLTI framework is compatible with representative meta-learning algorithms and consistently outperforms other state-of-the-art strategies.
arXiv Detail & Related papers (2021-06-04T20:15:34Z) - Model-based Adversarial Meta-Reinforcement Learning [38.28304764312512]
We propose Model-based Adversarial Meta-Reinforcement Learning (AdMRL)
AdMRL aims to minimize the worst-case sub-optimality gap across all tasks in a family of tasks.
We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy in the worst-case performance over all tasks.
arXiv Detail & Related papers (2020-06-16T02:21:49Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.