On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning
- URL: http://arxiv.org/abs/2206.03271v1
- Date: Tue, 7 Jun 2022 13:24:00 GMT
- Title: On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning
- Authors: Zhao Mandi, Pieter Abbeel, Stephen James
- Abstract summary: We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
- Score: 71.55412580325743
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Intelligent agents should have the ability to leverage knowledge from
previously learned tasks in order to learn new ones quickly and efficiently.
Meta-learning approaches have emerged as a popular solution to achieve this.
However, meta-reinforcement learning (meta-RL) algorithms have thus far been
restricted to simple environments with narrow task distributions. Moreover, the
paradigm of pretraining followed by fine-tuning to adapt to new tasks has
emerged as a simple yet effective solution in supervised and self-supervised
learning. This calls into question the benefits of meta-learning approaches
also in reinforcement learning, which typically come at the cost of high
complexity. We hence investigate meta-RL approaches in a variety of
vision-based benchmarks, including Procgen, RLBench, and Atari, where
evaluations are made on completely novel tasks. Our findings show that when
meta-learning approaches are evaluated on different tasks (rather than
different variations of the same task), multi-task pretraining with fine-tuning
on new tasks performs equally as well, or better, than meta-pretraining with
meta test-time adaptation. This is encouraging for future research, as
multi-task pretraining tends to be simpler and computationally cheaper than
meta-RL. From these findings, we advocate for evaluating future meta-RL methods
on more challenging tasks and including multi-task pretraining with fine-tuning
as a simple, yet strong baseline.
Related papers
- Efficient Meta Reinforcement Learning for Preference-based Fast
Adaptation [17.165083095799712]
We study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning.
We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback.
arXiv Detail & Related papers (2022-11-20T03:55:09Z) - Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms.
Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - MetaCURE: Meta Reinforcement Learning with Empowerment-Driven
Exploration [52.48362697163477]
Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on sparse-reward tasks.
We model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning.
We develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies.
arXiv Detail & Related papers (2020-06-15T06:56:18Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.