Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot
Policy Imitation
- URL: http://arxiv.org/abs/2306.13554v1
- Date: Fri, 23 Jun 2023 15:29:15 GMT
- Title: Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot
Policy Imitation
- Authors: Massimiliano Patacchiola, Mingfei Sun, Katja Hofmann, Richard E.
Turner
- Abstract summary: State-of-the-art methods to tackle few-shot imitation rely on meta-learning.
Recent work has shown that fine-tuners outperform meta-learners in few-shot image classification tasks.
We release an open source dataset called iMuJoCo consisting of 154 variants of popular OpenAI-Gym MuJoCo environments.
- Score: 45.312333134810665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we explore few-shot imitation learning for control problems,
which involves learning to imitate a target policy by accessing a limited set
of offline rollouts. This setting has been relatively under-explored despite
its relevance to robotics and control applications. State-of-the-art methods
developed to tackle few-shot imitation rely on meta-learning, which is
expensive to train as it requires access to a distribution over tasks (rollouts
from many target policies and variations of the base environment). Given this
limitation we investigate an alternative approach, fine-tuning, a family of
methods that pretrain on a single dataset and then fine-tune on unseen
domain-specific data. Recent work has shown that fine-tuners outperform
meta-learners in few-shot image classification tasks, especially when the data
is out-of-domain. Here we evaluate to what extent this is true for control
problems, proposing a simple yet effective baseline which relies on two stages:
(i) training a base policy online via reinforcement learning (e.g. Soft
Actor-Critic) on a single base environment, (ii) fine-tuning the base policy
via behavioral cloning on a few offline rollouts of the target policy. Despite
its simplicity this baseline is competitive with meta-learning methods on a
variety of conditions and is able to imitate target policies trained on unseen
variations of the original environment. Importantly, the proposed approach is
practical and easy to implement, as it does not need any complex meta-training
protocol. As a further contribution, we release an open source dataset called
iMuJoCo (iMitation MuJoCo) consisting of 154 variants of popular OpenAI-Gym
MuJoCo environments with associated pretrained target policies and rollouts,
which can be used by the community to study few-shot imitation learning and
offline reinforcement learning.
Related papers
- Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization [17.729842629392742]
We study a Reinforcement Learning problem in which we are given a set of trajectories collected with K baseline policies.
The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space.
arXiv Detail & Related papers (2024-03-28T14:34:02Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Goal-Conditioned Imitation Learning using Score-based Diffusion Policies [3.49482137286472]
We propose a new policy representation based on score-based diffusion models (SDMs)
We apply our new policy representation in the domain of Goal-Conditioned Imitation Learning (GCIL)
We show how BESO can even be used to learn a goal-independent policy from play-data usingintuitive-free guidance.
arXiv Detail & Related papers (2023-04-05T15:52:34Z) - Robust Task Representations for Offline Meta-Reinforcement Learning via
Contrastive Learning [21.59254848913971]
offline meta-reinforcement learning is a reinforcement learning paradigm that learns from offline data to adapt to new tasks.
We propose a contrastive learning framework for task representations that are robust to the distribution of behavior policies in training and test.
Experiments on a variety of offline meta-reinforcement learning benchmarks demonstrate the advantages of our method over prior methods.
arXiv Detail & Related papers (2022-06-21T14:46:47Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment.
We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return.
On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.