An Explicit Expansion of the Kullback-Leibler Divergence along its
Fisher-Rao Gradient Flow
- URL: http://arxiv.org/abs/2302.12229v1
- Date: Thu, 23 Feb 2023 18:47:54 GMT
- Title: An Explicit Expansion of the Kullback-Leibler Divergence along its
Fisher-Rao Gradient Flow
- Authors: Carles Domingo-Enrich, Aram-Alexandre Pooladian
- Abstract summary: We show that when $pirhollback$ exhibits multiple modes, $pirhollback$ is that textitindependent of the potential function.
We provide an explicit expansion of $textKL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
KL.
- Score: 8.052709336750823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Let $V_* : \mathbb{R}^d \to \mathbb{R}$ be some (possibly non-convex)
potential function, and consider the probability measure $\pi \propto
e^{-V_*}$. When $\pi$ exhibits multiple modes, it is known that sampling
techniques based on Wasserstein gradient flows of the Kullback-Leibler (KL)
divergence (e.g. Langevin Monte Carlo) suffer poorly in the rate of
convergence, where the dynamics are unable to easily traverse between modes. In
stark contrast, the work of Lu et al. (2019; 2022) has shown that the gradient
flow of the KL with respect to the Fisher-Rao (FR) geometry exhibits a
convergence rate to $\pi$ is that \textit{independent} of the potential
function. In this short note, we complement these existing results in the
literature by providing an explicit expansion of
$\text{KL}(\rho_t^{\text{FR}}\|\pi)$ in terms of $e^{-t}$, where
$(\rho_t^{\text{FR}})_{t\geq 0}$ is the FR gradient flow of the KL divergence.
In turn, we are able to provide a clean asymptotic convergence rate, where the
burn-in time is guaranteed to be finite. Our proof is based on observing a
similarity between FR gradient flows and simulated annealing with linear
scaling, and facts about cumulant generating functions. We conclude with simple
synthetic experiments that demonstrate our theoretical findings are indeed
tight. Based on our numerics, we conjecture that the asymptotic rates of
convergence for Wasserstein-Fisher-Rao gradient flows are possibly related to
this expansion in some cases.
Related papers
- Hellinger-Kantorovich Gradient Flows: Global Exponential Decay of Entropy Functionals [52.154685604660465]
We investigate a family of gradient flows of positive and probability measures, focusing on the Hellinger-Kantorovich (HK) geometry.
A central contribution is a complete characterization of global exponential decay behaviors of entropy functionals under Otto-Wasserstein and Hellinger-type gradient flows.
arXiv Detail & Related papers (2025-01-28T16:17:09Z) - Iterated Schrödinger bridge approximation to Wasserstein Gradient Flows [1.5561923713703105]
We introduce a novel discretization scheme for Wasserstein gradient flows that involves successively computing Schr"odinger bridges with the same marginals.
The proposed scheme has two advantages: one, it avoids the use of the score function, and, two, it is amenable to particle-based approximations using the Sinkhorn algorithm.
arXiv Detail & Related papers (2024-06-16T07:23:26Z) - Exact dynamics of quantum dissipative $XX$ models: Wannier-Stark localization in the fragmented operator space [49.1574468325115]
We find an exceptional point at a critical dissipation strength that separates oscillating and non-oscillating decay.
We also describe a different type of dissipation that leads to a single decay mode in the whole operator subspace.
arXiv Detail & Related papers (2024-05-27T16:11:39Z) - Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in Reproducing Kernel Hilbert Spaces [1.2499537119440245]
We regularize the $f$-divergence by a squared maximum mean discrepancy associated with a characteristic kernel $K$.
We exploit well-known results on Moreau envelopes in Hilbert spaces to analyze the MMD-regularized $f$-divergences.
We provide proof-of-the-concept numerical examples for flows starting from empirical measures.
arXiv Detail & Related papers (2024-02-07T06:30:39Z) - Over-Parameterization Exponentially Slows Down Gradient Descent for
Learning a Single Neuron [49.45105570960104]
We prove the global convergence of randomly gradient descent with a $Oleft(T-3right)$ rate.
These two bounds jointly give an exact characterization of the convergence rate.
We show this potential function converges slowly, which implies the slow convergence rate of the loss function.
arXiv Detail & Related papers (2023-02-20T15:33:26Z) - Unadjusted Langevin algorithm for sampling a mixture of weakly smooth
potentials [0.0]
We prove convergence guarantees under Poincar'e inequality or non-strongly convex outside the ball.
We also provide convergence in $L_beta$-Wasserstein metric for the smoothing potential.
arXiv Detail & Related papers (2021-12-17T04:10:09Z) - Federated Functional Gradient Boosting [75.06942944563572]
We study functional minimization in Federated Learning.
For both FFGB.C and FFGB.L, the radii of convergence shrink to zero as the feature distributions become more homogeneous.
arXiv Detail & Related papers (2021-03-11T21:49:19Z) - Simulated annealing from continuum to discretization: a convergence
analysis via the Eyring--Kramers law [10.406659081400354]
We study the convergence rate of continuous-time simulated annealing $(X_t;, t ge 0)$ and its discretization $(x_k;, k =0,1, ldots)$
We prove that the tail probability $mathbbP(f(X_t) > min f +delta)$ (resp. $mathP(f(x_k) > min f +delta)$) decays in time (resp. in cumulative step size)
arXiv Detail & Related papers (2021-02-03T23:45:39Z) - On the Convergence of Gradient Descent in GANs: MMD GAN As a Gradient
Flow [26.725412498545385]
We show that a parametric kernelized gradient flow mimics the min-max game in gradient regularized $mathrmMMD$ GAN.
We then derive an explicit condition which ensures that gradient descent on the space of the generator in regularized $mathrmMMD$ GAN is globally convergent to the target distribution.
arXiv Detail & Related papers (2020-11-04T16:55:00Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - The Convergence Indicator: Improved and completely characterized
parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge.
Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.