$\alpha$-GAN: Convergence and Estimation Guarantees
- URL: http://arxiv.org/abs/2205.06393v1
- Date: Thu, 12 May 2022 23:26:51 GMT
- Title: $\alpha$-GAN: Convergence and Estimation Guarantees
- Authors: Gowtham R. Kurri, Monica Welfert, Tyler Sypherd, Lalitha Sankar
- Abstract summary: We prove a correspondence between the min-max optimization of general CPE loss function GANs and the minimization of associated $f$-divergences.
We then focus on $alpha$-GAN, defined via the $alpha$-loss, which interpolates several GANs and corresponds to the minimization of the Arimoto divergence.
- Score: 7.493779672689531
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove a two-way correspondence between the min-max optimization of general
CPE loss function GANs and the minimization of associated $f$-divergences. We
then focus on $\alpha$-GAN, defined via the $\alpha$-loss, which interpolates
several GANs (Hellinger, vanilla, Total Variation) and corresponds to the
minimization of the Arimoto divergence. We show that the Arimoto divergences
induced by $\alpha$-GAN equivalently converge, for all $\alpha\in
\mathbb{R}_{>0}\cup\{\infty\}$. However, under restricted learning models and
finite samples, we provide estimation bounds which indicate diverse GAN
behavior as a function of $\alpha$. Finally, we present empirical results on a
toy dataset that highlight the practical utility of tuning the $\alpha$
hyperparameter.
Related papers
- Statistical Error Bounds for GANs with Nonlinear Objective Functionals [5.022028859839544]
We derive statistical error bounds for $(f,Gamma)$-GANs for general classes of $f$ and $Gamma$ in the form of finite-sample concentration inequalities.
Results prove the statistical consistency of $(f,Gamma)$-GANs and reduce to the known results for IPM-GANs in the appropriate limit.
arXiv Detail & Related papers (2024-06-24T17:42:03Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Addressing GAN Training Instabilities via Tunable Classification Losses [8.151943266391493]
Generative adversarial networks (GANs) allow generating synthetic data with formal guarantees.
We show that all symmetric $f$-divergences are equivalent in convergence.
We also highlight the value of tuning $(alpha_D,alpha_G)$ in alleviating training instabilities for the synthetic 2D Gaussian mixture ring.
arXiv Detail & Related papers (2023-10-27T17:29:07Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - $(\alpha_D,\alpha_G)$-GANs: Addressing GAN Training Instabilities via
Dual Objectives [7.493779672689531]
We introduce a class of dual-objective GANs with different value functions (objectives) for the generator (G) and discriminator (D)
We show that the resulting non-zero sum game simplifies to minimize an $f$-divergence under appropriate conditions on $(alpha_D,alpha_G)$.
We highlight the value of tuning $(alpha_D,alpha_G)$ in alleviating training instabilities for the synthetic 2D Gaussian mixture ring and the Stacked MNIST datasets.
arXiv Detail & Related papers (2023-02-28T05:22:54Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Realizing GANs via a Tunable Loss Function [7.455546102930911]
We introduce a tunable GAN, called $alpha$-GAN, parameterized by $alpha in (0,infty]$.
We show that $alpha$-GAN is intimately related to the Arimoto divergence.
arXiv Detail & Related papers (2021-06-09T17:18:21Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Least $k$th-Order and R\'{e}nyi Generative Adversarial Networks [12.13405065406781]
Experimental results indicate that the proposed loss functions, applied to the MNIST and CelebA datasets, confer performance benefits by virtue of the extra degrees of freedom provided by the parameters $k$ and $alpha$, respectively.
While it was applied to GANs in this study, the proposed approach is generic and can be used in other applications of information theory to deep learning, e.g., the issues of fairness or privacy in artificial intelligence.
arXiv Detail & Related papers (2020-06-03T18:44:05Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.