Non-saturating GAN training as divergence minimization
- URL: http://arxiv.org/abs/2010.08029v1
- Date: Thu, 15 Oct 2020 21:34:56 GMT
- Title: Non-saturating GAN training as divergence minimization
- Authors: Matt Shannon, Ben Poole, Soroosh Mariooryad, Tom Bagby, Eric
Battenberg, David Kao, Daisy Stanton, RJ Skerry-Ryan
- Abstract summary: We show that non-saturating generative adversarial network (GAN) training does in fact approximately minimize a particular f-divergence.
These results help to explain the high sample quality but poor diversity often observed empirically when using this scheme.
- Score: 22.459183156517728
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Non-saturating generative adversarial network (GAN) training is widely used
and has continued to obtain groundbreaking results. However so far this
approach has lacked strong theoretical justification, in contrast to
alternatives such as f-GANs and Wasserstein GANs which are motivated in terms
of approximate divergence minimization. In this paper we show that
non-saturating GAN training does in fact approximately minimize a particular
f-divergence. We develop general theoretical tools to compare and classify
f-divergences and use these to show that the new f-divergence is qualitatively
similar to reverse KL. These results help to explain the high sample quality
but poor diversity often observed empirically when using this scheme.
Related papers
- Theory-Informed Improvements to Classifier-Free Guidance for Discrete Diffusion Models [24.186262549509102]
This paper theoretically analyzes CFG in the context of masked discrete diffusion.<n>High guidance early in sampling (when inputs are heavily masked) harms generation quality, while late-stage guidance has a larger effect.<n>Our method smoothens the transport between the data distribution and the initial (masked/uniform) distribution, which results in improved sample quality.
arXiv Detail & Related papers (2025-07-11T18:48:29Z) - Gradient Extrapolation for Debiased Representation Learning [7.183424522250937]
Gradient Extrapolation for Debiased Representation Learning (GERNE) is designed to learn debiased representations in both known and unknown attribute training cases.
GERNE can serve as a general framework for debiasing with methods, such as ERM, reweighting, and resampling, being shown as special cases.
The proposed approach is validated on five vision and one NLP benchmarks, demonstrating competitive and often superior performance compared to state-of-the-art baseline methods.
arXiv Detail & Related papers (2025-03-17T14:48:57Z) - Nested Annealed Training Scheme for Generative Adversarial Networks [54.70743279423088]
This paper focuses on a rigorous mathematical theoretical framework: the composite-functional-gradient GAN (CFG)
We reveal the theoretical connection between the CFG model and score-based models.
We find that the training objective of the CFG discriminator is equivalent to finding an optimal D(x)
arXiv Detail & Related papers (2025-01-20T07:44:09Z) - A New Formulation of Lipschitz Constrained With Functional Gradient Learning for GANs [52.55025869932486]
This paper introduces a promising alternative method for training Generative Adversarial Networks (GANs) on large-scale datasets with clear theoretical guarantees.
We propose a novel Lipschitz-constrained Functional Gradient GANs learning (Li-CFG) method to stabilize the training of GAN.
We demonstrate that the neighborhood size of the latent vector can be reduced by increasing the norm of the discriminator gradient.
arXiv Detail & Related papers (2025-01-20T02:48:07Z) - Contrastive CFG: Improving CFG in Diffusion Models by Contrasting Positive and Negative Concepts [55.298031232672734]
As-Free Guidance (CFG) has proven effective in conditional diffusion model sampling for improved condition alignment.
We present a novel method to enhance negative CFG guidance using contrastive loss.
arXiv Detail & Related papers (2024-11-26T03:29:27Z) - Supervised Contrastive Learning with Heterogeneous Similarity for
Distribution Shifts [3.7819322027528113]
We propose a new regularization using the supervised contrastive learning to prevent such overfitting and to train models that do not degrade their performance under the distribution shifts.
Experiments on benchmark datasets that emulate distribution shifts, including subpopulation shift and domain generalization, demonstrate the advantage of the proposed method.
arXiv Detail & Related papers (2023-04-07T01:45:09Z) - MonoFlow: Rethinking Divergence GANs via the Perspective of Wasserstein
Gradient Flows [34.795115757545915]
We introduce a unified generative modeling framework - MonoFlow.
Under our framework, adversarial training can be viewed as a procedure first obtaining MonoFlow's vector field.
We also reveal the fundamental difference between variational divergence minimization and adversarial training.
arXiv Detail & Related papers (2023-02-02T13:05:27Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - UQGAN: A Unified Model for Uncertainty Quantification of Deep
Classifiers trained via Conditional GANs [9.496524884855559]
We present an approach to quantifying uncertainty for deep neural networks in image classification, based on generative adversarial networks (GANs)
Instead of shielding the entire in-distribution data with GAN generated OoD examples, we shield each class separately with out-of-class examples generated by a conditional GAN.
In particular, we improve over the OoD detection and FP detection performance of state-of-the-art GAN-training based classifiers.
arXiv Detail & Related papers (2022-01-31T14:42:35Z) - Understanding Why Generalized Reweighting Does Not Improve Over ERM [36.69039005731499]
Empirical risk minimization (ERM) is known in practice to be non-robust to distributional shift where the training and the test distributions are different.
A suite of approaches, such as importance weighting, and variants of distributionally robust optimization (DRO) have been proposed to solve this problem.
But a line of recent work has empirically shown that these approaches do not significantly improve over ERM in real applications with distribution shift.
arXiv Detail & Related papers (2022-01-28T17:58:38Z) - GANs with Variational Entropy Regularizers: Applications in Mitigating
the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples.
GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution.
We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z) - Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks [65.24701908364383]
We show that a sufficient condition for a uncertainty on a ReLU network is "to be a bit Bayesian calibrated"
We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
arXiv Detail & Related papers (2020-02-24T08:52:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.