Model-based Adversarial Meta-Reinforcement Learning
- URL: http://arxiv.org/abs/2006.08875v2
- Date: Sat, 27 Feb 2021 13:19:45 GMT
- Title: Model-based Adversarial Meta-Reinforcement Learning
- Authors: Zichuan Lin, Garrett Thomas, Guangwen Yang, Tengyu Ma
- Abstract summary: We propose Model-based Adversarial Meta-Reinforcement Learning (AdMRL)
AdMRL aims to minimize the worst-case sub-optimality gap across all tasks in a family of tasks.
We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy in the worst-case performance over all tasks.
- Score: 38.28304764312512
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Meta-reinforcement learning (meta-RL) aims to learn from multiple training
tasks the ability to adapt efficiently to unseen test tasks. Despite the
success, existing meta-RL algorithms are known to be sensitive to the task
distribution shift. When the test task distribution is different from the
training task distribution, the performance may degrade significantly. To
address this issue, this paper proposes Model-based Adversarial
Meta-Reinforcement Learning (AdMRL), where we aim to minimize the worst-case
sub-optimality gap -- the difference between the optimal return and the return
that the algorithm achieves after adaptation -- across all tasks in a family of
tasks, with a model-based approach. We propose a minimax objective and optimize
it by alternating between learning the dynamics model on a fixed task and
finding the adversarial task for the current model -- the task for which the
policy induced by the model is maximally suboptimal. Assuming the family of
tasks is parameterized, we derive a formula for the gradient of the
suboptimality with respect to the task parameters via the implicit function
theorem, and show how the gradient estimator can be efficiently implemented by
the conjugate gradient method and a novel use of the REINFORCE estimator. We
evaluate our approach on several continuous control benchmarks and demonstrate
its efficacy in the worst-case performance over all tasks, the generalization
power to out-of-distribution tasks, and in training and test time sample
efficiency, over existing state-of-the-art meta-RL algorithms.
Related papers
- Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Meta-Reinforcement Learning Based on Self-Supervised Task Representation
Learning [23.45043290237396]
MoSS is a context-based Meta-reinforcement learning algorithm based on Self-Supervised task representation learning.
On MuJoCo and Meta-World benchmarks, MoSS outperforms prior in terms of performance, sample efficiency (3-50x faster), adaptation efficiency, and generalization.
arXiv Detail & Related papers (2023-04-29T15:46:19Z) - Task Weighting in Meta-learning with Trajectory Optimisation [37.32107678838193]
We introduce a new principled and fully-automated task-weighting algorithm for meta-learning methods.
By considering the weights of tasks within the same mini-batch as an action, we cast the task-weighting meta-learning problem to a trajectory optimisation.
We empirically demonstrate that the proposed approach out-performs common hand-engineering weighting methods in two few-shot learning benchmarks.
arXiv Detail & Related papers (2023-01-04T01:36:09Z) - On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z) - Meta-Reinforcement Learning in Broad and Non-Parametric Environments [8.091658684517103]
We introduce TIGR, a Task-Inference-based meta-RL algorithm for tasks in non-parametric environments.
We decouple the policy training from the task-inference learning and efficiently train the inference mechanism on the basis of an unsupervised reconstruction objective.
We provide a benchmark with qualitatively distinct tasks based on the half-cheetah environment and demonstrate the superior performance of TIGR compared to state-of-the-art meta-RL approaches.
arXiv Detail & Related papers (2021-08-08T19:32:44Z) - Energy-Efficient and Federated Meta-Learning via Projected Stochastic
Gradient Ascent [79.58680275615752]
We propose an energy-efficient federated meta-learning framework.
We assume each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model.
arXiv Detail & Related papers (2021-05-31T08:15:44Z) - Submodular Meta-Learning [43.15332631500541]
We introduce a discrete variant of the meta-learning framework to improve performance on future tasks.
Our approach aims at using prior data, i.e., previously visited tasks, to train a proper initial solution set.
We show that our framework leads to a significant reduction in computational complexity in solving the new tasks while incurring a small performance loss.
arXiv Detail & Related papers (2020-07-11T21:02:48Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Curriculum in Gradient-Based Meta-Reinforcement Learning [10.447238563837173]
We show that gradient-based meta-learners are sensitive to task distributions.
With the wrong curriculum, agents suffer the effects of meta-overfitting, shallow adaptation, and adaptation instability.
arXiv Detail & Related papers (2020-02-19T01:40:45Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.