Local Convergence of Gradient Descent-Ascent for Training Generative
Adversarial Networks
- URL: http://arxiv.org/abs/2305.08277v2
- Date: Mon, 29 May 2023 16:40:43 GMT
- Title: Local Convergence of Gradient Descent-Ascent for Training Generative
Adversarial Networks
- Authors: Evan Becker, Parthe Pandit, Sundeep Rangan, Alyson K. Fletcher
- Abstract summary: We study the local dynamics of gradient descent-ascent (GDA) for training a GAN with a kernel-based discriminator.
We show phase transitions that indicate when the system converges, oscillates, or diverges.
- Score: 20.362912591032636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative Adversarial Networks (GANs) are a popular formulation to train
generative models for complex high dimensional data. The standard method for
training GANs involves a gradient descent-ascent (GDA) procedure on a minimax
optimization problem. This procedure is hard to analyze in general due to the
nonlinear nature of the dynamics. We study the local dynamics of GDA for
training a GAN with a kernel-based discriminator. This convergence analysis is
based on a linearization of a non-linear dynamical system that describes the
GDA iterations, under an \textit{isolated points model} assumption from [Becker
et al. 2022]. Our analysis brings out the effect of the learning rates,
regularization, and the bandwidth of the kernel discriminator, on the local
convergence rate of GDA. Importantly, we show phase transitions that indicate
when the system converges, oscillates, or diverges. We also provide numerical
simulations that verify our claims.
Related papers
- Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space [72.52365911990935]
We introduce Bellman Diffusion, a novel DGM framework that maintains linearity in MDPs through gradient and scalar field modeling.
Our results show that Bellman Diffusion achieves accurate field estimations and is a capable image generator, converging 1.5x faster than the traditional histogram-based baseline in distributional RL tasks.
arXiv Detail & Related papers (2024-10-02T17:53:23Z) - What's the score? Automated Denoising Score Matching for Nonlinear Diffusions [25.062104976775448]
Reversing a diffusion process by learning its score forms the heart of diffusion-based generative modeling.
We introduce a family of tractable denoising score matching objectives, called local-DSM.
We show how local-DSM melded with Taylor expansions enables automated training and score estimation with nonlinear diffusion processes.
arXiv Detail & Related papers (2024-07-10T19:02:19Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels.
We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium.
We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z) - Fast Convergence in Learning Two-Layer Neural Networks with Separable
Data [37.908159361149835]
We study normalized gradient descent on two-layer neural nets.
We prove for exponentially-tailed losses that using normalized GD leads to linear rate of convergence of the training loss to the global optimum.
arXiv Detail & Related papers (2023-05-22T20:30:10Z) - Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Dissecting adaptive methods in GANs [46.90376306847234]
We study how adaptive methods help train generative adversarial networks (GANs)
By considering an update rule with the magnitude of the Adam update and the normalized direction of SGD, we empirically show that the adaptive magnitude of Adam is key for GAN training.
We prove that in that setting, GANs trained with nSGDA recover all the modes of the true distribution, whereas the same networks trained with SGDA (and any learning rate configuration) suffer from mode collapse.
arXiv Detail & Related papers (2022-10-09T19:00:07Z) - Linearization and Identification of Multiple-Attractors Dynamical System
through Laplacian Eigenmaps [8.161497377142584]
We propose a Graph-based spectral clustering method that takes advantage of a velocity-augmented kernel to connect data-points belonging to the same dynamics.
We prove that there always exist a set of 2-dimensional embedding spaces in which the sub-dynamics are linear, and n-dimensional embedding where they are quasi-linear.
We learn a diffeomorphism from the Laplacian embedding space to the original space and show that the Laplacian embedding leads to good reconstruction accuracy and a faster training time.
arXiv Detail & Related papers (2022-02-18T12:43:25Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.