Planning for Sample Efficient Imitation Learning
- URL: http://arxiv.org/abs/2210.09598v1
- Date: Tue, 18 Oct 2022 05:19:26 GMT
- Title: Planning for Sample Efficient Imitation Learning
- Authors: Zhao-Heng Yin, Weirui Ye, Qifeng Chen, Yang Gao
- Abstract summary: Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously.
We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.
Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
- Score: 52.44953015011569
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning is a class of promising policy learning algorithms that is
free from many practical issues with reinforcement learning, such as the reward
design issue and the exploration hardness. However, the current imitation
algorithm struggles to achieve both high performance and high in-environment
sample efficiency simultaneously. Behavioral Cloning (BC) does not need
in-environment interactions, but it suffers from the covariate shift problem
which harms its performance. Adversarial Imitation Learning (AIL) turns
imitation learning into a distribution matching problem. It can achieve better
performance on some tasks but it requires a large number of in-environment
interactions. Inspired by the recent success of EfficientZero in RL, we propose
EfficientImitate (EI), a planning-based imitation learning method that can
achieve high in-environment sample efficiency and performance simultaneously.
Our algorithmic contribution in this paper is two-fold. First, we extend AIL
into the MCTS-based RL. Second, we show the seemingly incompatible two classes
of imitation algorithms (BC and AIL) can be naturally unified under our
framework, enjoying the benefits of both. We benchmark our method not only on
the state-based DeepMind Control Suite, but also on the image version which
many previous works find highly challenging. Experimental results show that EI
achieves state-of-the-art results in performance and sample efficiency. EI
shows over 4x gain in performance in the limited sample setting on state-based
and image-based tasks and can solve challenging problems like Humanoid, where
previous methods fail with small amount of interactions. Our code is available
at https://github.com/zhaohengyin/EfficientImitate.
Related papers
- Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Sample Efficient Reinforcement Learning by Automatically Learning to
Compose Subtasks [3.1594865504808944]
We propose an RL algorithm that automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks.
We evaluate our algorithm in a variety of sparse-reward environments.
arXiv Detail & Related papers (2024-01-25T15:06:40Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - Sample Efficient Imitation Learning via Reward Function Trained in
Advance [2.66512000865131]
Imitation learning (IL) is a framework that learns to imitate expert behavior from demonstrations.
In this article, we make an effort to improve sample efficiency by introducing a novel scheme of inverse reinforcement learning.
arXiv Detail & Related papers (2021-11-23T08:06:09Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - DERAIL: Diagnostic Environments for Reward And Imitation Learning [9.099589602551573]
We develop a suite of diagnostic tasks that test individual facets of algorithm performance in isolation.
Results confirm that algorithm performance is highly sensitive to implementation details.
Case-study shows how the suite can pinpoint design flaws and rapidly evaluate candidate solutions.
arXiv Detail & Related papers (2020-12-02T18:07:09Z) - Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z) - Augmenting GAIL with BC for sample efficient imitation learning [5.199454801210509]
We present a simple and elegant method to combine behavior cloning and GAIL to enable stable and sample efficient learning.
Our algorithm is very simple to implement and integrates with different policy gradient algorithms.
We demonstrate the effectiveness of the algorithm in low dimensional control tasks, gridworlds and in high dimensional image-based tasks.
arXiv Detail & Related papers (2020-01-21T22:28:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.