$f$-GAIL: Learning $f$-Divergence for Generative Adversarial Imitation
Learning
- URL: http://arxiv.org/abs/2010.01207v2
- Date: Thu, 19 Nov 2020 05:29:20 GMT
- Title: $f$-GAIL: Learning $f$-Divergence for Generative Adversarial Imitation
Learning
- Authors: Xin Zhang, Yanhua Li, Ziming Zhang, Zhi-Li Zhang
- Abstract summary: Imitation learning aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors.
Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency?
We propose $f$-GAIL, a new generative adversarial imitation learning (GAIL) model, that automatically learns a discrepancy measure and a policy capable of producing expert-like behaviors.
- Score: 29.459037918810143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning (IL) aims to learn a policy from expert demonstrations
that minimizes the discrepancy between the learner and expert behaviors.
Various imitation learning algorithms have been proposed with different
pre-determined divergences to quantify the discrepancy. This naturally gives
rise to the following question: Given a set of expert demonstrations, which
divergence can recover the expert policy more accurately with higher data
efficiency? In this work, we propose $f$-GAIL, a new generative adversarial
imitation learning (GAIL) model, that automatically learns a discrepancy
measure from the $f$-divergence family as well as a policy capable of producing
expert-like behaviors. Compared with IL baselines with various predefined
divergence measures, $f$-GAIL learns better policies with higher data
efficiency in six physics-based control tasks.
Related papers
- Multi-Agent Imitation Learning: Value is Easy, Regret is Hard [52.31989962031179]
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents.
Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations.
While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee to deviations by strategic agents.
arXiv Detail & Related papers (2024-06-06T16:18:20Z) - Robust Semi-supervised Learning via $f$-Divergence and $α$-Rényi Divergence [2.9965913883475137]
This paper investigates a range of empirical risk functions and regularization methods suitable for self-training methods in semi-supervised learning.
Inspired by the theoretical foundations rooted in divergences, i.e., $f$-divergences and $alpha$-R'enyi divergence, we also provide valuable insights to enhance the understanding of our empirical risk functions and regularization techniques.
arXiv Detail & Related papers (2024-05-01T11:16:02Z) - MEGA-DAgger: Imitation Learning with Multiple Imperfect Experts [7.4506213369860195]
MEGA-DAgger is a new DAgger variant that is suitable for interactive learning with multiple imperfect experts.
We demonstrate that policy learned using MEGA-DAgger can outperform both experts and policies learned using the state-of-the-art interactive imitation learning algorithms.
arXiv Detail & Related papers (2023-03-01T16:40:54Z) - CLARE: Conservative Model-Based Reward Learning for Offline Inverse
Reinforcement Learning [26.05184273238923]
This work aims to tackle a major challenge in offline Inverse Reinforcement Learning (IRL)
We devise a principled algorithm (namely CLARE) that solves offline IRL efficiently via integrating "conservatism" into a learned reward function.
Our theoretical analysis provides an upper bound on the return gap between the learned policy and the expert policy.
arXiv Detail & Related papers (2023-02-09T17:16:29Z) - Deconfounding Imitation Learning with Variational Inference [19.99248795957195]
Standard imitation learning can fail when the expert demonstrators have different sensory inputs than the imitating agent.
This is because partial observability gives rise to hidden confounders in the causal graph.
We propose to train a variational inference model to infer the expert's latent information and use it to train a latent-conditional policy.
arXiv Detail & Related papers (2022-11-04T18:00:02Z) - Imitation Learning by Estimating Expertise of Demonstrators [92.20185160311036]
We show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators.
We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess.
arXiv Detail & Related papers (2022-02-02T21:23:19Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - SS-MAIL: Self-Supervised Multi-Agent Imitation Learning [18.283839252425803]
Two families of algorithms - Behavioral Cloning (BC) and Adversarial Imitation Learning (AIL)
BC approaches suffer from compounding errors, as they ignore the sequential decision-making nature of the trajectory generation problem.
AIL methods are plagued with instability in their training dynamics.
We introduce a novel self-supervised loss that encourages the discriminator to approximate a richer reward function.
arXiv Detail & Related papers (2021-10-18T01:17:50Z) - Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.
The goal is to find a policy that matches the expert's performance on some predefined set of cost functions.
We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z) - On Computation and Generalization of Generative Adversarial Imitation
Learning [134.17122587138897]
Generative Adversarial Learning (GAIL) is a powerful and practical approach for learning sequential decision-making policies.
This paper investigates the theoretical properties of GAIL.
arXiv Detail & Related papers (2020-01-09T00:40:19Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.