Cumulant GAN
- URL: http://arxiv.org/abs/2006.06625v3
- Date: Tue, 24 Aug 2021 15:44:34 GMT
- Title: Cumulant GAN
- Authors: Yannis Pantazis, Dipjyoti Paul, Michail Fasoulakis, Yannis Stylianou
and Markos Katsoulakis
- Abstract summary: We propose a novel loss function for training Generative Adversarial Networks (GANs)
We show that the corresponding optimization problem is equivalent to R'enyi divergence minimization.
We experimentally demonstrate that image generation is more robust relative to Wasserstein GAN.
- Score: 17.4556035872983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel loss function for training Generative
Adversarial Networks (GANs) aiming towards deeper theoretical understanding as
well as improved stability and performance for the underlying optimization
problem. The new loss function is based on cumulant generating functions giving
rise to \emph{Cumulant GAN}. Relying on a recently-derived variational formula,
we show that the corresponding optimization problem is equivalent to R{\'e}nyi
divergence minimization, thus offering a (partially) unified perspective of GAN
losses: the R{\'e}nyi family encompasses Kullback-Leibler divergence (KLD),
reverse KLD, Hellinger distance and $\chi^2$-divergence. Wasserstein GAN is
also a member of cumulant GAN. In terms of stability, we rigorously prove the
linear convergence of cumulant GAN to the Nash equilibrium for a linear
discriminator, Gaussian distributions and the standard gradient descent ascent
algorithm. Finally, we experimentally demonstrate that image generation is more
robust relative to Wasserstein GAN and it is substantially improved in terms of
both inception score and Fr\'echet inception distance when both weaker and
stronger discriminators are considered.
Related papers
- Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence [54.580605276017096]
Rectified Flow (RF) aims to learn straight flow trajectories from noise to data using a sequence of convex optimization problems.
RF theoretically straightens the trajectory through successive rectifications, reducing the number of evaluations function (NFEs) while sampling.
We provide the first theoretical analysis of the Wasserstein distance between the sampling distribution of RF and the target distribution.
arXiv Detail & Related papers (2024-10-19T02:36:11Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Taming Nonconvex Stochastic Mirror Descent with General Bregman
Divergence [25.717501580080846]
This paper revisits the convergence of gradient Forward Mirror (SMD) in the contemporary non optimization setting.
For the training, we develop provably convergent algorithms for the problem of linear networks.
arXiv Detail & Related papers (2024-02-27T17:56:49Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Function-space regularized R\'enyi divergences [6.221019624345409]
We propose a new family of regularized R'enyi divergences parametrized by a variational function space.
We prove several properties of these new divergences, showing that they interpolate between the classical R'enyi divergences and IPMs.
We show that the proposed regularized R'enyi divergences inherit features from IPMs such as the ability to compare distributions that are not absolutely continuous.
arXiv Detail & Related papers (2022-10-10T19:18:04Z) - GANs as Gradient Flows that Converge [3.8707695363745223]
We show that along the gradient flow induced by a distribution-dependent ordinary differential equation, the unknown data distribution emerges as the long-time limit.
The simulation of the ODE is shown equivalent to the training of generative networks (GANs)
This equivalence provides a new "cooperative" view of GANs and, more importantly, sheds new light on the divergence of GANs.
arXiv Detail & Related papers (2022-05-05T20:29:13Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - Least $k$th-Order and R\'{e}nyi Generative Adversarial Networks [12.13405065406781]
Experimental results indicate that the proposed loss functions, applied to the MNIST and CelebA datasets, confer performance benefits by virtue of the extra degrees of freedom provided by the parameters $k$ and $alpha$, respectively.
While it was applied to GANs in this study, the proposed approach is generic and can be used in other applications of information theory to deep learning, e.g., the issues of fairness or privacy in artificial intelligence.
arXiv Detail & Related papers (2020-06-03T18:44:05Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z) - Discriminator Contrastive Divergence: Semi-Amortized Generative Modeling
by Exploring Energy of the Discriminator [85.68825725223873]
Generative Adversarial Networks (GANs) have shown great promise in modeling high dimensional data.
We introduce the Discriminator Contrastive Divergence, which is well motivated by the property of WGAN's discriminator.
We demonstrate the benefits of significant improved generation on both synthetic data and several real-world image generation benchmarks.
arXiv Detail & Related papers (2020-04-05T01:50:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.