Convergence and Sample Complexity of SGD in GANs
- URL: http://arxiv.org/abs/2012.00732v1
- Date: Tue, 1 Dec 2020 18:50:38 GMT
- Title: Convergence and Sample Complexity of SGD in GANs
- Authors: Vasilis Kontonis, Sihan Liu, Christos Tzamos
- Abstract summary: We provide convergence guarantees on training Generative Adversarial Networks (GANs) via SGD.
We consider learning a target distribution modeled by a 1-layer Generator network with a non-linear activation function.
Our results apply to a broad class of non-linear activation functions $phi$, including ReLUs and is enabled by a connection with truncated statistics.
- Score: 15.25030172685628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide theoretical convergence guarantees on training Generative
Adversarial Networks (GANs) via SGD. We consider learning a target distribution
modeled by a 1-layer Generator network with a non-linear activation function
$\phi(\cdot)$ parametrized by a $d \times d$ weight matrix $\mathbf W_*$, i.e.,
$f_*(\mathbf x) = \phi(\mathbf W_* \mathbf x)$.
Our main result is that by training the Generator together with a
Discriminator according to the Stochastic Gradient Descent-Ascent iteration
proposed by Goodfellow et al. yields a Generator distribution that approaches
the target distribution of $f_*$. Specifically, we can learn the target
distribution within total-variation distance $\epsilon$ using $\tilde
O(d^2/\epsilon^2)$ samples which is (near-)information theoretically optimal.
Our results apply to a broad class of non-linear activation functions $\phi$,
including ReLUs and is enabled by a connection with truncated statistics and an
appropriate design of the Discriminator network. Our approach relies on a
bilevel optimization framework to show that vanilla SGDA works.
Related papers
- Concentration Inequalities for $(f,Γ)$-GANs [5.022028859839544]
Generative adversarial networks (GANs) are unsupervised learning methods for training a generator distribution to produce samples that approximate those drawn from a target distribution.
Recent works have proven the statistical consistency of GANs based on integral probability metrics (IPMs), e.g., WGAN which is based on the 1-Wasserstein metric.
A much larger class of GANs, which allow for the use of nonlinear objective functionals, can be constructed using $(f,Gamma)$-divergences.
arXiv Detail & Related papers (2024-06-24T17:42:03Z) - Convergence Analysis of Probability Flow ODE for Score-based Generative Models [5.939858158928473]
We study the convergence properties of deterministic samplers based on probability flow ODEs from both theoretical and numerical perspectives.
We prove the total variation between the target and the generated data distributions can be bounded above by $mathcalO(d3/4delta1/2)$ in the continuous time level.
arXiv Detail & Related papers (2024-04-15T12:29:28Z) - Idempotent Generative Network [61.78905138698094]
We propose a new approach for generative modeling based on training a neural network to be idempotent.
An idempotent operator is one that can be applied sequentially without changing the result beyond the initial application.
We find that by processing inputs from both target and source distributions, the model adeptly projects corrupted or modified data back to the target manifold.
arXiv Detail & Related papers (2023-11-02T17:59:55Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - $(\alpha_D,\alpha_G)$-GANs: Addressing GAN Training Instabilities via
Dual Objectives [7.493779672689531]
We introduce a class of dual-objective GANs with different value functions (objectives) for the generator (G) and discriminator (D)
We show that the resulting non-zero sum game simplifies to minimize an $f$-divergence under appropriate conditions on $(alpha_D,alpha_G)$.
We highlight the value of tuning $(alpha_D,alpha_G)$ in alleviating training instabilities for the synthetic 2D Gaussian mixture ring and the Stacked MNIST datasets.
arXiv Detail & Related papers (2023-02-28T05:22:54Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Why Are Convolutional Nets More Sample-Efficient than Fully-Connected
Nets? [33.51250867983687]
We show a natural task on which a provable sample complexity gap can be shown, for standard training algorithms.
We demonstrate a single target function, learning which on all possible distributions leads to an $O(1)$ vs $Omega(d2/varepsilon)$ gap.
Similar results are achieved for $ell$ regression and adaptive training algorithms, e.g. Adam and AdaGrad.
arXiv Detail & Related papers (2020-10-16T17:15:39Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.