An Explicit Expansion of the Kullback-Leibler Divergence along its
Fisher-Rao Gradient Flow
- URL: http://arxiv.org/abs/2302.12229v1
- Date: Thu, 23 Feb 2023 18:47:54 GMT
- Title: An Explicit Expansion of the Kullback-Leibler Divergence along its
Fisher-Rao Gradient Flow
- Authors: Carles Domingo-Enrich, Aram-Alexandre Pooladian
- Abstract summary: We show that when $pirhollback$ exhibits multiple modes, $pirhollback$ is that textitindependent of the potential function.
We provide an explicit expansion of $textKL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
- Score: 8.052709336750823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Let $V_* : \mathbb{R}^d \to \mathbb{R}$ be some (possibly non-convex)
potential function, and consider the probability measure $\pi \propto
e^{-V_*}$. When $\pi$ exhibits multiple modes, it is known that sampling
techniques based on Wasserstein gradient flows of the Kullback-Leibler (KL)
divergence (e.g. Langevin Monte Carlo) suffer poorly in the rate of
convergence, where the dynamics are unable to easily traverse between modes. In
stark contrast, the work of Lu et al. (2019; 2022) has shown that the gradient
flow of the KL with respect to the Fisher-Rao (FR) geometry exhibits a
convergence rate to $\pi$ is that \textit{independent} of the potential
function. In this short note, we complement these existing results in the
literature by providing an explicit expansion of
$\text{KL}(\rho_t^{\text{FR}}\|\pi)$ in terms of $e^{-t}$, where
$(\rho_t^{\text{FR}})_{t\geq 0}$ is the FR gradient flow of the KL divergence.
In turn, we are able to provide a clean asymptotic convergence rate, where the
burn-in time is guaranteed to be finite. Our proof is based on observing a
similarity between FR gradient flows and simulated annealing with linear
scaling, and facts about cumulant generating functions. We conclude with simple
synthetic experiments that demonstrate our theoretical findings are indeed
tight. Based on our numerics, we conjecture that the asymptotic rates of
convergence for Wasserstein-Fisher-Rao gradient flows are possibly related to
this expansion in some cases.
Related papers
- A Unified Analysis for Finite Weight Averaging [50.75116992029417]
Averaging iterations of Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA)
In this paper, we generalize LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.
arXiv Detail & Related papers (2024-11-20T10:08:22Z) - von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions.
The resulting probability model has connections with continuous spin models in statistical physics.
For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z) - Iterated Schrödinger bridge approximation to Wasserstein Gradient Flows [1.5561923713703105]
We introduce a novel discretization scheme for Wasserstein gradient flows that involves successively computing Schr"odinger bridges with the same marginals.
The proposed scheme has two advantages: one, it avoids the use of the score function, and, two, it is amenable to particle-based approximations using the Sinkhorn algorithm.
arXiv Detail & Related papers (2024-06-16T07:23:26Z) - Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in
Reproducing Kernel Hilbert Spaces [1.3654846342364308]
We regularize the $f$-divergence by a squared maximum mean discrepancy associated with a characteristic kernel $K$.
We exploit well-known results on envelopes in Hilbert spaces to prove properties of the MMD-regularized $f$-divergences.
We provide proof-of-the-concept numerical examples for $f$-divergences with both infinite and finite recession constant.
arXiv Detail & Related papers (2024-02-07T06:30:39Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Unadjusted Langevin algorithm for sampling a mixture of weakly smooth
potentials [0.0]
We prove convergence guarantees under Poincar'e inequality or non-strongly convex outside the ball.
We also provide convergence in $L_beta$-Wasserstein metric for the smoothing potential.
arXiv Detail & Related papers (2021-12-17T04:10:09Z) - Federated Functional Gradient Boosting [75.06942944563572]
We study functional minimization in Federated Learning.
For both FFGB.C and FFGB.L, the radii of convergence shrink to zero as the feature distributions become more homogeneous.
arXiv Detail & Related papers (2021-03-11T21:49:19Z) - Simulated annealing from continuum to discretization: a convergence
analysis via the Eyring--Kramers law [10.406659081400354]
We study the convergence rate of continuous-time simulated annealing $(X_t;, t ge 0)$ and its discretization $(x_k;, k =0,1, ldots)$
We prove that the tail probability $mathbbP(f(X_t) > min f +delta)$ (resp. $mathP(f(x_k) > min f +delta)$) decays in time (resp. in cumulative step size)
arXiv Detail & Related papers (2021-02-03T23:45:39Z) - On the Convergence of Gradient Descent in GANs: MMD GAN As a Gradient
Flow [26.725412498545385]
We show that a parametric kernelized gradient flow mimics the min-max game in gradient regularized $mathrmMMD$ GAN.
We then derive an explicit condition which ensures that gradient descent on the space of the generator in regularized $mathrmMMD$ GAN is globally convergent to the target distribution.
arXiv Detail & Related papers (2020-11-04T16:55:00Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - The Convergence Indicator: Improved and completely characterized
parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge.
Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.