Asymptotic Statistical Analysis of $f$-divergence GAN
- URL: http://arxiv.org/abs/2209.06853v1
- Date: Wed, 14 Sep 2022 18:08:37 GMT
- Title: Asymptotic Statistical Analysis of $f$-divergence GAN
- Authors: Xinwei Shen, Kani Chen, and Tong Zhang
- Abstract summary: Generative Adversarial Networks (GANs) have achieved great success in data generation.
We consider the statistical behavior of the general $f$-divergence formulation of GAN.
The resulting estimation method is referred to as Adversarial Gradient Estimation (AGE)
- Score: 13.587087960403199
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Adversarial Networks (GANs) have achieved great success in data
generation. However, its statistical properties are not fully understood. In
this paper, we consider the statistical behavior of the general $f$-divergence
formulation of GAN, which includes the Kullback--Leibler divergence that is
closely related to the maximum likelihood principle. We show that for
parametric generative models that are correctly specified, all $f$-divergence
GANs with the same discriminator classes are asymptotically equivalent under
suitable regularity conditions. Moreover, with an appropriately chosen local
discriminator, they become equivalent to the maximum likelihood estimate
asymptotically. For generative models that are misspecified, GANs with
different $f$-divergences {converge to different estimators}, and thus cannot
be directly compared. However, it is shown that for some commonly used
$f$-divergences, the original $f$-GAN is not optimal in that one can achieve a
smaller asymptotic variance when the discriminator training in the original
$f$-GAN formulation is replaced by logistic regression. The resulting
estimation method is referred to as Adversarial Gradient Estimation (AGE).
Empirical studies are provided to support the theory and to demonstrate the
advantage of AGE over the original $f$-GANs under model misspecification.
Related papers
- Concentration Inequalities for $(f,Γ)$-GANs [5.022028859839544]
Generative adversarial networks (GANs) are unsupervised learning methods for training a generator distribution to produce samples that approximate those drawn from a target distribution.
Recent works have proven the statistical consistency of GANs based on integral probability metrics (IPMs), e.g., WGAN which is based on the 1-Wasserstein metric.
A much larger class of GANs, which allow for the use of nonlinear objective functionals, can be constructed using $(f,Gamma)$-divergences.
arXiv Detail & Related papers (2024-06-24T17:42:03Z) - The Curse of Memory in Stochastic Approximation: Extended Version [1.534667887016089]
Theory and application of approximation (SA) has grown within the control systems community since the earliest days of adaptive control.
Recent results establish remarkable performance of SA with (sufficiently small) constant step-size $alpha>0$.
arXiv Detail & Related papers (2023-09-06T12:22:32Z) - Distribution learning via neural differential equations: a nonparametric
statistical perspective [1.4436965372953483]
This work establishes the first general statistical convergence analysis for distribution learning via ODE models trained through likelihood transformations.
We show that the latter can be quantified via the $C1$-metric entropy of the class $mathcal F$.
We then apply this general framework to the setting of $Ck$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $mathcal F$: $Ck$ functions and neural networks.
arXiv Detail & Related papers (2023-09-03T00:21:37Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Large Deviations for Classification Performance Analysis of Machine
Learning Systems [16.74271332025289]
We show that under appropriate conditions the error probabilities vanish exponentially, as $sim expleft(-n,I + o(n) right)$, where $I$ is the error rate and $n$ is the number of observations available for testing.
The theoretical findings are finally tested using the popular MNIST dataset.
arXiv Detail & Related papers (2023-01-16T10:48:12Z) - On the Identifiability and Estimation of Causal Location-Scale Noise
Models [122.65417012597754]
We study the class of location-scale or heteroscedastic noise models (LSNMs)
We show the causal direction is identifiable up to some pathological cases.
We propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.
arXiv Detail & Related papers (2022-10-13T17:18:59Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Discriminator Contrastive Divergence: Semi-Amortized Generative Modeling
by Exploring Energy of the Discriminator [85.68825725223873]
Generative Adversarial Networks (GANs) have shown great promise in modeling high dimensional data.
We introduce the Discriminator Contrastive Divergence, which is well motivated by the property of WGAN's discriminator.
We demonstrate the benefits of significant improved generation on both synthetic data and several real-world image generation benchmarks.
arXiv Detail & Related papers (2020-04-05T01:50:16Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.