Related papers: Augmenting GAIL with BC for sample efficient imitation learning

Augmenting GAIL with BC for sample efficient imitation learning

URL: http://arxiv.org/abs/2001.07798v4
Date: Mon, 9 Nov 2020 20:04:36 GMT
Title: Augmenting GAIL with BC for sample efficient imitation learning
Authors: Rohit Jena, Changliu Liu, Katia Sycara
Abstract summary: We present a simple and elegant method to combine behavior cloning and GAIL to enable stable and sample efficient learning. Our algorithm is very simple to implement and integrates with different policy gradient algorithms. We demonstrate the effectiveness of the algorithm in low dimensional control tasks, gridworlds and in high dimensional image-based tasks.
Score: 5.199454801210509
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Imitation learning is the problem of recovering an expert policy without access to a reward signal. Behavior cloning and GAIL are two widely used methods for performing imitation learning. Behavior cloning converges in a few iterations but doesn't achieve peak performance due to its inherent iid assumption about the state-action distribution. GAIL addresses the issue by accounting for the temporal dependencies when performing a state distribution matching between the agent and the expert. Although GAIL is sample efficient in the number of expert trajectories required, it is still not very sample efficient in terms of the environment interactions needed for convergence of the policy. Given the complementary benefits of both methods, we present a simple and elegant method to combine both methods to enable stable and sample efficient learning. Our algorithm is very simple to implement and integrates with different policy gradient algorithms. We demonstrate the effectiveness of the algorithm in low dimensional control tasks, gridworlds and in high dimensional image-based tasks.

Related papers

Sample Efficient Reinforcement Learning by Automatically Learning to Compose Subtasks [3.1594865504808944]
We propose an RL algorithm that automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks. We evaluate our algorithm in a variety of sparse-reward environments.
arXiv Detail & Related papers (2024-01-25T15:06:40Z)
Coherent Soft Imitation Learning [17.345411907902932]
Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward. This work derives an imitation method that captures the strengths of both BC and IRL.
arXiv Detail & Related papers (2023-05-25T21:54:22Z)
Okapi: Generalising Better by Making Statistical Matches Match [7.392460712829188]
Okapi is a simple, efficient, and general method for robust semi-supervised learning based on online statistical matching. Our method uses a nearest-neighbours-based matching procedure to generate cross-domain views for a consistency loss. We show that it is in fact possible to leverage additional unlabelled data to improve upon empirical risk minimisation.
arXiv Detail & Related papers (2022-11-07T12:41:17Z)
Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously. We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously. Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z)
Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation. We develop a new adversarial learning based method, which is simple and efficient to apply. We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z)
A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states. We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z)
Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
Zeroth-Order Supervised Policy Improvement [94.0748002906652]
Policy gradient (PG) algorithms have been widely used in reinforcement learning (RL) We propose Zeroth-Order Supervised Policy Improvement (ZOSPI) ZOSPI exploits the estimated value function $Q$ globally while preserving the local exploitation of the PG methods.
arXiv Detail & Related papers (2020-06-11T16:49:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.