Generative Adversarial Imitation Learning with Neural Networks: Global
Optimality and Convergence Rate
- URL: http://arxiv.org/abs/2003.03709v2
- Date: Thu, 25 Jun 2020 03:33:18 GMT
- Title: Generative Adversarial Imitation Learning with Neural Networks: Global
Optimality and Convergence Rate
- Authors: Yufeng Zhang, Qi Cai, Zhuoran Yang, Zhaoran Wang
- Abstract summary: Generative policy imitation learning (GAIL) demonstrates tremendous success in practice, especially when combined with neural networks.
Despite its empirical success, it remains unclear whether GAIL with neural networks converges to the globally optimal solution.
- Score: 122.73276299136568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative adversarial imitation learning (GAIL) demonstrates tremendous
success in practice, especially when combined with neural networks. Different
from reinforcement learning, GAIL learns both policy and reward function from
expert (human) demonstration. Despite its empirical success, it remains unclear
whether GAIL with neural networks converges to the globally optimal solution.
The major difficulty comes from the nonconvex-nonconcave minimax optimization
structure. To bridge the gap between practice and theory, we analyze a
gradient-based algorithm with alternating updates and establish its sublinear
convergence to the globally optimal solution. To the best of our knowledge, our
analysis establishes the global optimality and convergence rate of GAIL with
neural networks for the first time.
Related papers
- Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks [2.0072624123275533]
The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps.
This work studies a GGN method for optimizing a two-layer neural network with explicit regularization.
arXiv Detail & Related papers (2024-04-23T10:02:22Z) - No Wrong Turns: The Simple Geometry Of Neural Networks Optimization
Paths [12.068608358926317]
First-order optimization algorithms are known to efficiently locate favorable minima in deep neural networks.
We focus on the fundamental geometric properties of sampled quantities of optimization on two key paths.
Our findings suggest that not only do optimization trajectories never encounter significant obstacles, but they also maintain stable dynamics during the majority of training.
arXiv Detail & Related papers (2023-06-20T22:10:40Z) - Acceleration techniques for optimization over trained neural network
ensembles [1.0323063834827415]
We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit activation.
We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network.
arXiv Detail & Related papers (2021-12-13T20:50:54Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.