On Computation and Generalization of Generative Adversarial Imitation
Learning
- URL: http://arxiv.org/abs/2001.02792v2
- Date: Sun, 12 Jan 2020 03:31:31 GMT
- Title: On Computation and Generalization of Generative Adversarial Imitation
Learning
- Authors: Minshuo Chen, Yizhou Wang, Tianyi Liu, Zhuoran Yang, Xingguo Li,
Zhaoran Wang, Tuo Zhao
- Abstract summary: Generative Adversarial Learning (GAIL) is a powerful and practical approach for learning sequential decision-making policies.
This paper investigates the theoretical properties of GAIL.
- Score: 134.17122587138897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Adversarial Imitation Learning (GAIL) is a powerful and practical
approach for learning sequential decision-making policies. Different from
Reinforcement Learning (RL), GAIL takes advantage of demonstration data by
experts (e.g., human), and learns both the policy and reward function of the
unknown environment. Despite the significant empirical progresses, the theory
behind GAIL is still largely unknown. The major difficulty comes from the
underlying temporal dependency of the demonstration data and the minimax
computational formulation of GAIL without convex-concave structure. To bridge
such a gap between theory and practice, this paper investigates the theoretical
properties of GAIL. Specifically, we show: (1) For GAIL with general reward
parameterization, the generalization can be guaranteed as long as the class of
the reward functions is properly controlled; (2) For GAIL, where the reward is
parameterized as a reproducing kernel function, GAIL can be efficiently solved
by stochastic first order optimization algorithms, which attain sublinear
convergence to a stationary solution. To the best of our knowledge, these are
the first results on statistical and computational guarantees of imitation
learning with reward/policy function approximation. Numerical experiments are
provided to support our analysis.
Related papers
- Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation [13.228240527941619]
We introduce a new method called optimization-based AIL (OPT-AIL)
OPT-AIL is the first provably efficient AIL method with general function approximation.
Empirical studies demonstrate that OPT-AIL outperforms previous state-of-the-art deep AIL methods in several challenging tasks.
arXiv Detail & Related papers (2024-11-01T14:17:38Z) - Tractable and Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation [8.378137704007038]
We present a regret analysis for distributional reinforcement learning with general value function approximation.
Our theoretical results show that approximating the infinite-dimensional return distribution with a finite number of moment functionals is the only method to learn the statistical information unbiasedly.
arXiv Detail & Related papers (2024-07-31T00:43:51Z) - Is Inverse Reinforcement Learning Harder than Standard Reinforcement
Learning? A Theoretical Perspective [55.36819597141271]
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an emphexpert policy -- plays a critical role in developing intelligent systems.
This paper provides the first line of efficient IRL in vanilla offline and online settings using samples and runtime.
As an application, we show that the learned rewards can emphtransfer to another target MDP with suitable guarantees.
arXiv Detail & Related papers (2023-11-29T00:09:01Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Efficient Performance Bounds for Primal-Dual Reinforcement Learning from
Demonstrations [1.0609815608017066]
We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations.
Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive.
We introduce a novel bilinear saddle-point framework using Lagrangian duality to bridge the gap between theory and practice.
arXiv Detail & Related papers (2021-12-28T05:47:24Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - When Will Generative Adversarial Imitation Learning Algorithms Attain
Global Convergence [56.40794592158596]
We study generative adversarial imitation learning (GAIL) under general MDP and for nonlinear reward function classes.
This is the first systematic theoretical study of GAIL for global convergence.
arXiv Detail & Related papers (2020-06-24T06:24:37Z) - Augmenting GAIL with BC for sample efficient imitation learning [5.199454801210509]
We present a simple and elegant method to combine behavior cloning and GAIL to enable stable and sample efficient learning.
Our algorithm is very simple to implement and integrates with different policy gradient algorithms.
We demonstrate the effectiveness of the algorithm in low dimensional control tasks, gridworlds and in high dimensional image-based tasks.
arXiv Detail & Related papers (2020-01-21T22:28:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.