$(f,\Gamma)$-Divergences: Interpolating between $f$-Divergences and
Integral Probability Metrics
- URL: http://arxiv.org/abs/2011.05953v3
- Date: Wed, 15 Sep 2021 14:25:24 GMT
- Title: $(f,\Gamma)$-Divergences: Interpolating between $f$-Divergences and
Integral Probability Metrics
- Authors: Jeremiah Birrell, Paul Dupuis, Markos A. Katsoulakis, Yannis Pantazis,
Luc Rey-Bellet
- Abstract summary: We develop a framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs)
We show that they can be expressed as a two-stage mass-redistribution/mass-transport process.
Using statistical learning as an example, we demonstrate their advantage in training generative adversarial networks (GANs) for heavy-tailed, not-absolutely continuous sample distributions.
- Score: 6.221019624345409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a rigorous and general framework for constructing
information-theoretic divergences that subsume both $f$-divergences and
integral probability metrics (IPMs), such as the $1$-Wasserstein distance. We
prove under which assumptions these divergences, hereafter referred to as
$(f,\Gamma)$-divergences, provide a notion of `distance' between probability
measures and show that they can be expressed as a two-stage
mass-redistribution/mass-transport process. The $(f,\Gamma)$-divergences
inherit features from IPMs, such as the ability to compare distributions which
are not absolutely continuous, as well as from $f$-divergences, namely the
strict concavity of their variational representations and the ability to
control heavy-tailed distributions for particular choices of $f$. When
combined, these features establish a divergence with improved properties for
estimation, statistical learning, and uncertainty quantification applications.
Using statistical learning as an example, we demonstrate their advantage in
training generative adversarial networks (GANs) for heavy-tailed,
not-absolutely continuous sample distributions. We also show improved
performance and stability over gradient-penalized Wasserstein GAN in image
generation.
Related papers
- Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV)
We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z) - Learning heavy-tailed distributions with Wasserstein-proximal-regularized $α$-divergences [12.19634962193403]
We propose Wasserstein proximals of $alpha$-divergences as suitable objective functionals for learning heavy-tailed distributions.
Heuristically, $alpha$-divergences handle the heavy tails and Wasserstein proximals allow non-absolute continuity between distributions.
arXiv Detail & Related papers (2024-05-22T19:58:13Z) - Gaussian-Smoothed Sliced Probability Divergences [15.123608776470077]
We show that smoothing and slicing preserve the metric property and the weak topology.
We also derive other properties, including continuity, of different divergences with respect to the smoothing parameter.
arXiv Detail & Related papers (2024-04-04T07:55:46Z) - Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian
Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.
This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Ensemble Multi-Quantiles: Adaptively Flexible Distribution Prediction
for Uncertainty Quantification [4.728311759896569]
We propose a novel, succinct, and effective approach for distribution prediction to quantify uncertainty in machine learning.
It incorporates adaptively flexible distribution prediction of $mathbbP(mathbfy|mathbfX=x)$ in regression tasks.
On extensive regression tasks from UCI datasets, we show that EMQ achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-11-26T11:45:32Z) - Function-space regularized R\'enyi divergences [6.221019624345409]
We propose a new family of regularized R'enyi divergences parametrized by a variational function space.
We prove several properties of these new divergences, showing that they interpolate between the classical R'enyi divergences and IPMs.
We show that the proposed regularized R'enyi divergences inherit features from IPMs such as the ability to compare distributions that are not absolutely continuous.
arXiv Detail & Related papers (2022-10-10T19:18:04Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - GFlowNet Foundations [66.69854262276391]
Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context.
We show a number of additional theoretical properties of GFlowNets.
arXiv Detail & Related papers (2021-11-17T17:59:54Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - Optimal Bounds between $f$-Divergences and Integral Probability Metrics [8.401473551081748]
Families of $f$-divergences and Integral Probability Metrics are widely used to quantify similarity between probability distributions.
We systematically study the relationship between these two families from the perspective of convex duality.
We obtain new bounds while also recovering in a unified manner well-known results, such as Hoeffding's lemma.
arXiv Detail & Related papers (2020-06-10T17:39:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.