Related papers: Statistical Error Bounds for GANs with Nonlinear Objective Functionals

Statistical Error Bounds for GANs with Nonlinear Objective Functionals

URL: http://arxiv.org/abs/2406.16834v2
Date: Tue, 07 Jan 2025 17:05:11 GMT
Title: Statistical Error Bounds for GANs with Nonlinear Objective Functionals
Authors: Jeremiah Birrell,
Abstract summary: We derive statistical error bounds for $(f,Gamma)$-GANs for general classes of $f$ and $Gamma$ in the form of finite-sample concentration inequalities.<n>Results prove the statistical consistency of $(f,Gamma)$-GANs and reduce to the known results for IPM-GANs in the appropriate limit.
Score: 5.022028859839544
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative adversarial networks (GANs) are unsupervised learning methods for training a generator distribution to produce samples that approximate those drawn from a target distribution. Many such methods can be formulated as minimization of a metric or divergence between probability distributions. Recent works have derived statistical error bounds for GANs that are based on integral probability metrics (IPMs), e.g., WGAN which is based on the 1-Wasserstein metric. In general, IPMs are defined by optimizing a linear functional (difference of expectations) over a space of discriminators. A much larger class of GANs, which we here call $(f,\Gamma)$-GANs, can be constructed using $f$-divergences (e.g., Jensen-Shannon, KL, or $\alpha$-divergences) together with a regularizing discriminator space $\Gamma$ (e.g., $1$-Lipschitz functions). These GANs have nonlinear objective functions, depending on the choice of $f$, and have been shown to exhibit improved performance in a number of applications. In this work we derive statistical error bounds for $(f,\Gamma)$-GANs for general classes of $f$ and $\Gamma$ in the form of finite-sample concentration inequalities. These results prove the statistical consistency of $(f,\Gamma)$-GANs and reduce to the known results for IPM-GANs in the appropriate limit. Finally, our results also give new insight into the performance of GANs for distributions with unbounded support.

Related papers

Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction [57.93371273485736]
We consider a centralized distributed learning setup where all workers jointly find an unbiased bound LDeltaepsilon2,$ better poly-logarithmically in $n$, even in the homogeneous (i.i.d.) case, where all workers access the same distribution.
arXiv Detail & Related papers (2025-06-30T13:27:39Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Idempotent Generative Network [61.78905138698094]
We propose a new approach for generative modeling based on training a neural network to be idempotent. An idempotent operator is one that can be applied sequentially without changing the result beyond the initial application. We find that by processing inputs from both target and source distributions, the model adeptly projects corrupted or modified data back to the target manifold.
arXiv Detail & Related papers (2023-11-02T17:59:55Z)
Addressing GAN Training Instabilities via Tunable Classification Losses [8.151943266391493]
Generative adversarial networks (GANs) allow generating synthetic data with formal guarantees. We show that all symmetric $f$-divergences are equivalent in convergence. We also highlight the value of tuning $(alpha_D,alpha_G)$ in alleviating training instabilities for the synthetic 2D Gaussian mixture ring.
arXiv Detail & Related papers (2023-10-27T17:29:07Z)
A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples. We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
$(\alpha_D,\alpha_G)$-GANs: Addressing GAN Training Instabilities via Dual Objectives [7.493779672689531]
We introduce a class of dual-objective GANs with different value functions (objectives) for the generator (G) and discriminator (D) We show that the resulting non-zero sum game simplifies to minimize an $f$-divergence under appropriate conditions on $(alpha_D,alpha_G)$. We highlight the value of tuning $(alpha_D,alpha_G)$ in alleviating training instabilities for the synthetic 2D Gaussian mixture ring and the Stacked MNIST datasets.
arXiv Detail & Related papers (2023-02-28T05:22:54Z)
Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z)
On counterfactual inference with unobserved confounding [36.18241676876348]
Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit. We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameter vectors. We derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality.
arXiv Detail & Related papers (2022-11-14T04:14:37Z)
On the Identifiability and Estimation of Causal Location-Scale Noise Models [122.65417012597754]
We study the class of location-scale or heteroscedastic noise models (LSNMs) We show the causal direction is identifiable up to some pathological cases. We propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.
arXiv Detail & Related papers (2022-10-13T17:18:59Z)
Asymptotic Statistical Analysis of $f$-divergence GAN [13.587087960403199]
Generative Adversarial Networks (GANs) have achieved great success in data generation. We consider the statistical behavior of the general $f$-divergence formulation of GAN. The resulting estimation method is referred to as Adversarial Gradient Estimation (AGE)
arXiv Detail & Related papers (2022-09-14T18:08:37Z)
$\alpha$-GAN: Convergence and Estimation Guarantees [7.493779672689531]
We prove a correspondence between the min-max optimization of general CPE loss function GANs and the minimization of associated $f$-divergences. We then focus on $alpha$-GAN, defined via the $alpha$-loss, which interpolates several GANs and corresponds to the minimization of the Arimoto divergence.
arXiv Detail & Related papers (2022-05-12T23:26:51Z)
Non-Gaussian Component Analysis via Lattice Basis Reduction [56.98280399449707]
Non-Gaussian Component Analysis (NGCA) is a distribution learning problem. We provide an efficient algorithm for NGCA in the regime that $A$ is discrete or nearly discrete.
arXiv Detail & Related papers (2021-12-16T18:38:02Z)
KL Guided Domain Adaptation [88.19298405363452]
Domain adaptation is an important problem and often needed for real-world applications. A common approach in the domain adaptation literature is to learn a representation of the input that has the same distributions over the source and the target domain. We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples.
arXiv Detail & Related papers (2021-06-14T22:24:23Z)
Convergence and Sample Complexity of SGD in GANs [15.25030172685628]
We provide convergence guarantees on training Generative Adversarial Networks (GANs) via SGD. We consider learning a target distribution modeled by a 1-layer Generator network with a non-linear activation function. Our results apply to a broad class of non-linear activation functions $phi$, including ReLUs and is enabled by a connection with truncated statistics.
arXiv Detail & Related papers (2020-12-01T18:50:38Z)
Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective. We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator) We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$. We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
Discriminator Contrastive Divergence: Semi-Amortized Generative Modeling by Exploring Energy of the Discriminator [85.68825725223873]
Generative Adversarial Networks (GANs) have shown great promise in modeling high dimensional data. We introduce the Discriminator Contrastive Divergence, which is well motivated by the property of WGAN's discriminator. We demonstrate the benefits of significant improved generation on both synthetic data and several real-world image generation benchmarks.
arXiv Detail & Related papers (2020-04-05T01:50:16Z)
Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling [106.68533003806276]
We show that sampling in latent space can be achieved by sampling in latent space according to an energy-based model induced by the sum of the latent prior log-density and the discriminator output score. We show that Discriminator Driven Latent Sampling(DDLS) is highly efficient compared to previous methods which work in the high-dimensional pixel space.
arXiv Detail & Related papers (2020-03-12T23:33:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.