Local Convergence of Gradient Descent-Ascent for Training Generative
Adversarial Networks
- URL: http://arxiv.org/abs/2305.08277v2
- Date: Mon, 29 May 2023 16:40:43 GMT
- Title: Local Convergence of Gradient Descent-Ascent for Training Generative
Adversarial Networks
- Authors: Evan Becker, Parthe Pandit, Sundeep Rangan, Alyson K. Fletcher
- Abstract summary: We study the local dynamics of gradient descent-ascent (GDA) for training a GAN with a kernel-based discriminator.
We show phase transitions that indicate when the system converges, oscillates, or diverges.
- Score: 20.362912591032636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative Adversarial Networks (GANs) are a popular formulation to train
generative models for complex high dimensional data. The standard method for
training GANs involves a gradient descent-ascent (GDA) procedure on a minimax
optimization problem. This procedure is hard to analyze in general due to the
nonlinear nature of the dynamics. We study the local dynamics of GDA for
training a GAN with a kernel-based discriminator. This convergence analysis is
based on a linearization of a non-linear dynamical system that describes the
GDA iterations, under an \textit{isolated points model} assumption from [Becker
et al. 2022]. Our analysis brings out the effect of the learning rates,
regularization, and the bandwidth of the kernel discriminator, on the local
convergence rate of GDA. Importantly, we show phase transitions that indicate
when the system converges, oscillates, or diverges. We also provide numerical
simulations that verify our claims.
Related papers
- The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks [2.1178416840822027]
We consider the setting of classification with homogeneous neural networks.
We show that normalized SGD iterates converge to the set of critical points of the normalized margin at late-stage training.
arXiv Detail & Related papers (2025-02-08T19:09:16Z) - Distributed Gradient Descent with Many Local Steps in Overparameterized Models [20.560882414631784]
In distributed training of machine learning models, gradient descent with local iterative steps is a popular method.
We try to explain this good performance from a viewpoint of implicit bias in Local Gradient Descent (Local-GD) with a large number of local steps.
arXiv Detail & Related papers (2024-12-10T23:19:40Z) - Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space [72.52365911990935]
We introduce Bellman Diffusion, a novel DGM framework that maintains linearity in MDPs through gradient and scalar field modeling.
Our results show that Bellman Diffusion achieves accurate field estimations and is a capable image generator, converging 1.5x faster than the traditional histogram-based baseline in distributional RL tasks.
arXiv Detail & Related papers (2024-10-02T17:53:23Z) - What's the score? Automated Denoising Score Matching for Nonlinear Diffusions [25.062104976775448]
Reversing a diffusion process by learning its score forms the heart of diffusion-based generative modeling.
We introduce a family of tractable denoising score matching objectives, called local-DSM.
We show how local-DSM melded with Taylor expansions enables automated training and score estimation with nonlinear diffusion processes.
arXiv Detail & Related papers (2024-07-10T19:02:19Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels.
We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium.
We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Linearization and Identification of Multiple-Attractors Dynamical System
through Laplacian Eigenmaps [8.161497377142584]
We propose a Graph-based spectral clustering method that takes advantage of a velocity-augmented kernel to connect data-points belonging to the same dynamics.
We prove that there always exist a set of 2-dimensional embedding spaces in which the sub-dynamics are linear, and n-dimensional embedding where they are quasi-linear.
We learn a diffeomorphism from the Laplacian embedding space to the original space and show that the Laplacian embedding leads to good reconstruction accuracy and a faster training time.
arXiv Detail & Related papers (2022-02-18T12:43:25Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.