Related papers: Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in Reproducing Kernel Hilbert Spaces

Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in Reproducing Kernel Hilbert Spaces

URL: http://arxiv.org/abs/2402.04613v3
Date: Thu, 16 Jan 2025 12:05:26 GMT
Title: Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in Reproducing Kernel Hilbert Spaces
Authors: Viktor Stein, Sebastian Neumayer, Nicolaj Rux, Gabriele Steidl,
Abstract summary: We regularize the $f$-divergence by a squared maximum mean discrepancy associated with a characteristic kernel $K$. We exploit well-known results on Moreau envelopes in Hilbert spaces to analyze the MMD-regularized $f$-divergences. We provide proof-of-the-concept numerical examples for flows starting from empirical measures.
Score: 1.2499537119440245
License:
Abstract: Commonly used $f$-divergences of measures, e.g., the Kullback-Leibler divergence, are subject to limitations regarding the support of the involved measures. A remedy is regularizing the $f$-divergence by a squared maximum mean discrepancy (MMD) associated with a characteristic kernel $K$. We use the kernel mean embedding to show that this regularization can be rewritten as the Moreau envelope of some function on the associated reproducing kernel Hilbert space. Then, we exploit well-known results on Moreau envelopes in Hilbert spaces to analyze the MMD-regularized $f$-divergences, particularly their gradients. Subsequently, we use our findings to analyze Wasserstein gradient flows of MMD-regularized $f$-divergences. We provide proof-of-the-concept numerical examples for flows starting from empirical measures. Here, we cover $f$-divergences with infinite and finite recession constants. Lastly, we extend our results to the tight variational formulation of $f$-divergences and numerically compare the resulting flows.

Related papers

A Unified Analysis for Finite Weight Averaging [50.75116992029417]
Averaging iterations of Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA) In this paper, we generalize LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.
arXiv Detail & Related papers (2024-11-20T10:08:22Z)
Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry [1.609940380983903]
We deal with the task of sampling from an unnormalized Boltzmann density $rho_D$ by learning a Boltzmann curve given by $f_t$. Inspired by M'at'e and Fleuret, we propose an which parametrizes only $f_t$ and fixes an appropriate $v_t$. This corresponds to the Wasserstein flow of the Kullback-Leibler divergence related to Langevin dynamics.
arXiv Detail & Related papers (2024-10-04T09:54:11Z)
Moreau Envelope ADMM for Decentralized Weakly Convex Optimization [55.2289666758254]
This paper proposes a proximal variant of the alternating direction method of multipliers (ADMM) for distributed optimization. The results of our numerical experiments indicate that our method is faster and more robust than widely-used approaches.
arXiv Detail & Related papers (2023-08-31T14:16:30Z)
Curvature-Independent Last-Iterate Convergence for Games on Riemannian Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
Convergence and concentration properties of constant step-size SGD through Markov chains [0.0]
We consider the optimization of a smooth and strongly convex objective using constant step-size gradient descent (SGD) We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance. All our results are non-asymptotic and their consequences are discussed through a few applications.
arXiv Detail & Related papers (2023-06-20T12:36:28Z)
An Explicit Expansion of the Kullback-Leibler Divergence along its Fisher-Rao Gradient Flow [8.052709336750823]
We show that when $pirhollback$ exhibits multiple modes, $pirhollback$ is that textitindependent of the potential function. We provide an explicit expansion of $textKL. KL. KL. KL. KL. KL. KL. KL. KL. KL. KL. KL. KL.
arXiv Detail & Related papers (2023-02-23T18:47:54Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance. Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z)
Moreau-Yosida $f$-divergences [0.0]
Variational representations of $f$-divergences are central to many machine learning algorithms. We generalize the so-called tight variational representation of $f$-divergences in the case of probability measures on compact metric spaces. We provide an implementation of the variational formulas for the Kullback-Leibler, reverse Kullback-Leibler, $chi2$, reverse $chi2$, squared Hellinger, Jensen-Shannon, Jeffreys, triangular discrimination and total variation divergences.
arXiv Detail & Related papers (2021-02-26T11:46:10Z)
On the Convergence of Gradient Descent in GANs: MMD GAN As a Gradient Flow [26.725412498545385]
We show that a parametric kernelized gradient flow mimics the min-max game in gradient regularized $mathrmMMD$ GAN. We then derive an explicit condition which ensures that gradient descent on the space of the generator in regularized $mathrmMMD$ GAN is globally convergent to the target distribution.
arXiv Detail & Related papers (2020-11-04T16:55:00Z)
A diffusion approach to Stein's method on Riemannian manifolds [65.36007959755302]
We exploit the relationship between the generator of a diffusion on $mathbf M$ with target invariant measure and its characterising Stein operator. We derive Stein factors, which bound the solution to the Stein equation and its derivatives. We imply that the bounds for $mathbb Rm$ remain valid when $mathbf M$ is a flat manifold.
arXiv Detail & Related papers (2020-03-25T17:03:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.