CoMPS: Continual Meta Policy Search
- URL: http://arxiv.org/abs/2112.04467v1
- Date: Wed, 8 Dec 2021 18:53:08 GMT
- Title: CoMPS: Continual Meta Policy Search
- Authors: Glen Berseth, Zhiwei Zhang, Grace Zhang, Chelsea Finn, Sergey Levine
- Abstract summary: We develop a new continual meta-learning method to address challenges in sequential multi-task learning.
We find that CoMPS outperforms prior continual learning and off-policy meta-reinforcement methods on several sequences of challenging continuous control tasks.
- Score: 113.33157585319906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a new continual meta-learning method to address challenges in
sequential multi-task learning. In this setting, the agent's goal is to achieve
high reward over any sequence of tasks quickly. Prior meta-reinforcement
learning algorithms have demonstrated promising results in accelerating the
acquisition of new tasks. However, they require access to all tasks during
training. Beyond simply transferring past experience to new tasks, our goal is
to devise continual reinforcement learning algorithms that learn to learn,
using their experience on previous tasks to learn new tasks more quickly. We
introduce a new method, continual meta-policy search (CoMPS), that removes this
limitation by meta-training in an incremental fashion, over each task in a
sequence, without revisiting prior tasks. CoMPS continuously repeats two
subroutines: learning a new task using RL and using the experience from RL to
perform completely offline meta-learning to prepare for subsequent task
learning. We find that CoMPS outperforms prior continual learning and
off-policy meta-reinforcement methods on several sequences of challenging
continuous control tasks.
Related papers
- Continual Task Allocation in Meta-Policy Network via Sparse Prompting [42.386912478509814]
We show how to train a generalizable meta-policy by continually learning a sequence of tasks.
We address it by "Continual Task Allocation via Sparse Prompting (CoTASP)"
In experiments, CoTASP achieves a promising plasticity-stability trade-off without storing or replaying any past tasks' experiences.
arXiv Detail & Related papers (2023-05-29T03:36:32Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - MetaCURE: Meta Reinforcement Learning with Empowerment-Driven
Exploration [52.48362697163477]
Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on sparse-reward tasks.
We model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning.
We develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies.
arXiv Detail & Related papers (2020-06-15T06:56:18Z) - Online Fast Adaptation and Knowledge Accumulation: a New Approach to
Continual Learning [74.07455280246212]
Continual learning studies agents that learn from streams of tasks without forgetting previous ones while adapting to new ones.
We show that current continual learning, meta-learning, meta-continual learning, and continual-meta learning techniques fail in this new scenario.
We propose Continual-MAML, an online extension of the popular MAML algorithm as a strong baseline for this scenario.
arXiv Detail & Related papers (2020-03-12T15:47:16Z) - Learning Context-aware Task Reasoning for Efficient Meta-reinforcement
Learning [29.125234093368732]
We propose a novel meta-RL strategy to achieve human-level efficiency in learning novel tasks.
We decompose the meta-RL problem into three sub-tasks, task-exploration, task-inference and task-fulfillment.
Our algorithm effectively performs exploration for task inference, improves sample efficiency during both training and testing, and mitigates the meta-overfitting problem.
arXiv Detail & Related papers (2020-03-03T07:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.