When Will Generative Adversarial Imitation Learning Algorithms Attain
Global Convergence
- URL: http://arxiv.org/abs/2006.13506v2
- Date: Thu, 25 Jun 2020 03:26:15 GMT
- Title: When Will Generative Adversarial Imitation Learning Algorithms Attain
Global Convergence
- Authors: Ziwei Guan, Tengyu Xu and Yingbin Liang
- Abstract summary: We study generative adversarial imitation learning (GAIL) under general MDP and for nonlinear reward function classes.
This is the first systematic theoretical study of GAIL for global convergence.
- Score: 56.40794592158596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative adversarial imitation learning (GAIL) is a popular inverse
reinforcement learning approach for jointly optimizing policy and reward from
expert trajectories. A primary question about GAIL is whether applying a
certain policy gradient algorithm to GAIL attains a global minimizer (i.e.,
yields the expert policy), for which existing understanding is very limited.
Such global convergence has been shown only for the linear (or linear-type) MDP
and linear (or linearizable) reward. In this paper, we study GAIL under general
MDP and for nonlinear reward function classes (as long as the objective
function is strongly concave with respect to the reward parameter). We
characterize the global convergence with a sublinear rate for a broad range of
commonly used policy gradient algorithms, all of which are implemented in an
alternating manner with stochastic gradient ascent for reward update, including
projected policy gradient (PPG)-GAIL, Frank-Wolfe policy gradient (FWPG)-GAIL,
trust region policy optimization (TRPO)-GAIL and natural policy gradient
(NPG)-GAIL. This is the first systematic theoretical study of GAIL for global
convergence.
Related papers
- Global Convergence of Policy Gradient Methods in Reinforcement Learning,
Games and Control [38.10940311690513]
Policy gradient methods are increasingly popular for sequential decision making in reinforcement learning, games, and control.
Guaranteeing the global optimality of policy gradient methods is highly nontrivial due to nonconcavity of the value functions.
arXiv Detail & Related papers (2023-10-08T16:54:25Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Policy Gradient for Reinforcement Learning with General Utilities [50.65940899590487]
In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards.
Many supervised and unsupervised RL problems are not covered in the Linear RL framework.
We derive the policy gradient theorem for RL with general utilities.
arXiv Detail & Related papers (2022-10-03T14:57:46Z) - Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective
Reinforcement Learning [17.916366827429034]
We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions.
We propose an Anchor-changing Regularized Natural Policy Gradient framework, which can incorporate ideas from well-performing first-order methods.
arXiv Detail & Related papers (2022-06-10T21:09:44Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Variational Policy Gradient Method for Reinforcement Learning with
General Utilities [38.54243339632217]
In recent years, reinforcement learning systems with general goals beyond a cumulative sum of rewards have gained traction.
In this paper, we consider policy in Decision Problems, where the objective converges a general concave utility function.
We derive a new Variational Policy Gradient Theorem for RL with general utilities.
arXiv Detail & Related papers (2020-07-04T17:51:53Z) - On Computation and Generalization of Generative Adversarial Imitation
Learning [134.17122587138897]
Generative Adversarial Learning (GAIL) is a powerful and practical approach for learning sequential decision-making policies.
This paper investigates the theoretical properties of GAIL.
arXiv Detail & Related papers (2020-01-09T00:40:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.