Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in
Reproducing Kernel Hilbert Spaces
- URL: http://arxiv.org/abs/2402.04613v2
- Date: Sat, 9 Mar 2024 22:58:57 GMT
- Title: Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in
Reproducing Kernel Hilbert Spaces
- Authors: Sebastian Neumayer, Viktor Stein, Gabriele Steidl, Nicolaj Rux
- Abstract summary: We regularize the $f$-divergence by a squared maximum mean discrepancy associated with a characteristic kernel $K$.
We exploit well-known results on envelopes in Hilbert spaces to prove properties of the MMD-regularized $f$-divergences.
We provide proof-of-the-concept numerical examples for $f$-divergences with both infinite and finite recession constant.
- Score: 1.3654846342364308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most commonly used $f$-divergences of measures, e.g., the Kullback-Leibler
divergence, are subject to limitations regarding the support of the involved
measures. A remedy consists of regularizing the $f$-divergence by a squared
maximum mean discrepancy (MMD) associated with a characteristic kernel $K$. In
this paper, we use the so-called kernel mean embedding to show that the
corresponding regularization can be rewritten as the Moreau envelope of some
function in the reproducing kernel Hilbert space associated with $K$. Then, we
exploit well-known results on Moreau envelopes in Hilbert spaces to prove
properties of the MMD-regularized $f$-divergences and, in particular, their
gradients. Subsequently, we use our findings to analyze Wasserstein gradient
flows of MMD-regularized $f$-divergences. Finally, we consider Wasserstein
gradient flows starting from empirical measures. We provide
proof-of-the-concept numerical examples for $f$-divergences with both infinite
and finite recession constant.
Related papers
- Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry [1.609940380983903]
We deal with the task of sampling from an unnormalized Boltzmann density $rho_D$ by learning a Boltzmann curve given by $f_t$.
Inspired by M'at'e and Fleuret, we propose an which parametrizes only $f_t$ and fixes an appropriate $v_t$.
This corresponds to the Wasserstein flow of the Kullback-Leibler divergence related to Langevin dynamics.
arXiv Detail & Related papers (2024-10-04T09:54:11Z) - Particle-based Variational Inference with Generalized Wasserstein
Gradient Flow [32.37056212527921]
We propose a ParVI framework, called generalized Wasserstein gradient descent (GWG)
We show that GWG exhibits strong convergence guarantees.
We also provide an adaptive version that automatically chooses Wasserstein metric to accelerate convergence.
arXiv Detail & Related papers (2023-10-25T10:05:42Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - An Explicit Expansion of the Kullback-Leibler Divergence along its
Fisher-Rao Gradient Flow [8.052709336750823]
We show that when $pirhollback$ exhibits multiple modes, $pirhollback$ is that textitindependent of the potential function.
We provide an explicit expansion of $textKL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
arXiv Detail & Related papers (2023-02-23T18:47:54Z) - Annihilating Entanglement Between Cones [77.34726150561087]
We show that Lorentz cones are the only cones with a symmetric base for which a certain stronger version of the resilience property is satisfied.
Our proof exploits the symmetries of the Lorentz cones and applies two constructions resembling protocols for entanglement distillation.
arXiv Detail & Related papers (2021-10-22T15:02:39Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Moreau-Yosida $f$-divergences [0.0]
Variational representations of $f$-divergences are central to many machine learning algorithms.
We generalize the so-called tight variational representation of $f$-divergences in the case of probability measures on compact metric spaces.
We provide an implementation of the variational formulas for the Kullback-Leibler, reverse Kullback-Leibler, $chi2$, reverse $chi2$, squared Hellinger, Jensen-Shannon, Jeffreys, triangular discrimination and total variation divergences.
arXiv Detail & Related papers (2021-02-26T11:46:10Z) - On the Convergence of Gradient Descent in GANs: MMD GAN As a Gradient
Flow [26.725412498545385]
We show that a parametric kernelized gradient flow mimics the min-max game in gradient regularized $mathrmMMD$ GAN.
We then derive an explicit condition which ensures that gradient descent on the space of the generator in regularized $mathrmMMD$ GAN is globally convergent to the target distribution.
arXiv Detail & Related papers (2020-11-04T16:55:00Z) - Metrizing Weak Convergence with Maximum Mean Discrepancies [88.54422104669078]
This paper characterizes the maximum mean discrepancies (MMD) that metrize the weak convergence of probability measures for a wide class of kernels.
We prove that, on a locally compact, non-compact, Hausdorff space, the MMD of a bounded continuous Borel measurable kernel k, metrizes the weak convergence of probability measures if and only if k is continuous.
arXiv Detail & Related papers (2020-06-16T15:49:33Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - A diffusion approach to Stein's method on Riemannian manifolds [65.36007959755302]
We exploit the relationship between the generator of a diffusion on $mathbf M$ with target invariant measure and its characterising Stein operator.
We derive Stein factors, which bound the solution to the Stein equation and its derivatives.
We imply that the bounds for $mathbb Rm$ remain valid when $mathbf M$ is a flat manifold.
arXiv Detail & Related papers (2020-03-25T17:03:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.