Understanding Entropic Regularization in GANs
- URL: http://arxiv.org/abs/2111.01387v1
- Date: Tue, 2 Nov 2021 06:08:16 GMT
- Title: Understanding Entropic Regularization in GANs
- Authors: Daria Reshetova, Yikun Bai, Xiugang Wu, Ayfer Ozgur
- Abstract summary: We study the influence of regularization on the learned solution of Wasserstein distance.
We show that entropy regularization promotes the solution sparsification, while replacing the Wasserstein distance with the Sinkhorn divergence recovers the unregularized solution.
We conclude that these regularization techniques can improve the quality of the generator learned from empirical data for a large class of distributions.
- Score: 5.448283690603358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Adversarial Networks are a popular method for learning
distributions from data by modeling the target distribution as a function of a
known distribution. The function, often referred to as the generator, is
optimized to minimize a chosen distance measure between the generated and
target distributions. One commonly used measure for this purpose is the
Wasserstein distance. However, Wasserstein distance is hard to compute and
optimize, and in practice entropic regularization techniques are used to
improve numerical convergence. The influence of regularization on the learned
solution, however, remains not well-understood. In this paper, we study how
several popular entropic regularizations of Wasserstein distance impact the
solution in a simple benchmark setting where the generator is linear and the
target distribution is high-dimensional Gaussian. We show that entropy
regularization promotes the solution sparsification, while replacing the
Wasserstein distance with the Sinkhorn divergence recovers the unregularized
solution. Both regularization techniques remove the curse of dimensionality
suffered by Wasserstein distance. We show that the optimal generator can be
learned to accuracy $\epsilon$ with $O(1/\epsilon^2)$ samples from the target
distribution. We thus conclude that these regularization techniques can improve
the quality of the generator learned from empirical data for a large class of
distributions.
Related papers
- Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Adversarial Likelihood Estimation With One-Way Flows [44.684952377918904]
Generative Adversarial Networks (GANs) can produce high-quality samples, but do not provide an estimate of the probability density around the samples.
We show that our method converges faster, produces comparable sample quality to GANs with similar architecture, successfully avoids over-fitting to commonly used datasets and produces smooth low-dimensional latent representations of the training data.
arXiv Detail & Related papers (2023-07-19T10:26:29Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Approximating a RUM from Distributions on k-Slates [88.32814292632675]
We find a generalization-time algorithm that finds the RUM that best approximates the given distribution on average.
Our theoretical result can also be made practical: we obtain a that is effective and scales to real-world datasets.
arXiv Detail & Related papers (2023-05-22T17:43:34Z) - Nonlinear Sufficient Dimension Reduction for
Distribution-on-Distribution Regression [9.086237593805173]
We introduce a new approach to nonlinear sufficient dimension reduction in cases where both the predictor and the response are distributional data.
Our key step is to build universal kernels (cc-universal) on the metric spaces.
arXiv Detail & Related papers (2022-07-11T04:11:36Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - Projected Sliced Wasserstein Autoencoder-based Hyperspectral Images
Anomaly Detection [42.585075865267946]
We propose the Projected Sliced Wasserstein (PSW) autoencoder-based anomaly detection method.
In particular, the computation-friendly eigen-decomposition method is leveraged to find the principal component for slicing the high-dimensional data.
Comprehensive experiments conducted on various real-world hyperspectral anomaly detection benchmarks demonstrate the superior performance of the proposed method.
arXiv Detail & Related papers (2021-12-20T09:21:02Z) - Fast Approximation of the Sliced-Wasserstein Distance Using
Concentration of Random Projections [19.987683989865708]
The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications.
We propose a new perspective to approximate SW by making use of the concentration of measure phenomenon.
Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation.
arXiv Detail & Related papers (2021-06-29T13:56:19Z) - Non-asymptotic convergence bounds for Wasserstein approximation using
point clouds [0.0]
We show how to generate discrete data as if sampled from a model probability distribution.
We provide explicit upper bounds for the convergence-type algorithm.
arXiv Detail & Related papers (2021-06-15T06:53:08Z) - Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization [106.70006655990176]
A distributional optimization problem arises widely in machine learning and statistics.
We propose a novel particle-based algorithm, dubbed as variational transport, which approximately performs Wasserstein gradient descent.
We prove that when the objective function satisfies a functional version of the Polyak-Lojasiewicz (PL) (Polyak, 1963) and smoothness conditions, variational transport converges linearly.
arXiv Detail & Related papers (2020-12-21T18:33:13Z) - Debiasing Distributed Second Order Optimization with Surrogate Sketching
and Scaled Regularization [101.5159744660701]
In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data.
Here, we introduce a new technique for debiasing the local estimates, which leads to both theoretical and empirical improvements in the convergence rate of distributed second order methods.
arXiv Detail & Related papers (2020-07-02T18:08:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.