On the Convergence of Gradient Descent in GANs: MMD GAN As a Gradient
Flow
- URL: http://arxiv.org/abs/2011.02402v1
- Date: Wed, 4 Nov 2020 16:55:00 GMT
- Title: On the Convergence of Gradient Descent in GANs: MMD GAN As a Gradient
Flow
- Authors: Youssef Mroueh, Truyen Nguyen
- Abstract summary: We show that a parametric kernelized gradient flow mimics the min-max game in gradient regularized $mathrmMMD$ GAN.
We then derive an explicit condition which ensures that gradient descent on the space of the generator in regularized $mathrmMMD$ GAN is globally convergent to the target distribution.
- Score: 26.725412498545385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the maximum mean discrepancy ($\mathrm{MMD}$) GAN problem and
propose a parametric kernelized gradient flow that mimics the min-max game in
gradient regularized $\mathrm{MMD}$ GAN. We show that this flow provides a
descent direction minimizing the $\mathrm{MMD}$ on a statistical manifold of
probability distributions. We then derive an explicit condition which ensures
that gradient descent on the parameter space of the generator in gradient
regularized $\mathrm{MMD}$ GAN is globally convergent to the target
distribution. Under this condition, we give non asymptotic convergence results
of gradient descent in MMD GAN. Another contribution of this paper is the
introduction of a dynamic formulation of a regularization of $\mathrm{MMD}$ and
demonstrating that the parametric kernelized descent for $\mathrm{MMD}$ is the
gradient flow of this functional with respect to the new Riemannian structure.
Our obtained theoretical result allows ones to treat gradient flows for quite
general functionals and thus has potential applications to other types of
variational inferences on a statistical manifold beyond GANs. Finally,
numerical experiments suggest that our parametric kernelized gradient flow
stabilizes GAN training and guarantees convergence.
Related papers
- Gradient Flows and Riemannian Structure in the Gromov-Wasserstein Geometry [29.650065650233223]
We study gradient flows in the Gromov-Wasserstein (GW) geometry.
We focus on the inner product GW (IGW) distance between distributions on $mathbbRd.
We identify the intrinsic IGW geometry that gives rise to the intrinsic IGW geometry, using which we establish a Benamou-Brenier-like formula for IGW.
arXiv Detail & Related papers (2024-07-16T14:53:23Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning [1.4050802766699084]
We consider the scenario of supervised learning in Deep Learning (DL) networks.
We choose the gradient flow with respect to the Euclidean metric in the output layer of the DL network.
arXiv Detail & Related papers (2023-11-27T02:12:02Z) - Bridging the Gap Between Variational Inference and Wasserstein Gradient
Flows [6.452626686361619]
We bridge the gap between variational inference and Wasserstein gradient flows.
Under certain conditions, the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow.
We also offer an alternative perspective on the path-derivative gradient, framing it as a distillation procedure to the Wasserstein gradient flow.
arXiv Detail & Related papers (2023-10-31T00:10:19Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Rigorous dynamical mean field theory for stochastic gradient descent
methods [17.90683687731009]
We prove closed-form equations for the exact high-dimensionals of a family of first order gradient-based methods.
This includes widely used algorithms such as gradient descent (SGD) or Nesterov acceleration.
arXiv Detail & Related papers (2022-10-12T21:10:55Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - A Variance Controlled Stochastic Method with Biased Estimation for
Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG)
$lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG.
$mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.