Related papers: An Explicit Expansion of the Kullback-Leibler Divergence along its Fisher-Rao Gradient Flow

An Explicit Expansion of the Kullback-Leibler Divergence along its Fisher-Rao Gradient Flow

URL: http://arxiv.org/abs/2302.12229v1
Date: Thu, 23 Feb 2023 18:47:54 GMT
Title: An Explicit Expansion of the Kullback-Leibler Divergence along its Fisher-Rao Gradient Flow
Authors: Carles Domingo-Enrich, Aram-Alexandre Pooladian
Abstract summary: We show that when $pirhollback$ exhibits multiple modes, $pirhollback$ is that textitindependent of the potential function. We provide an explicit expansion of $textKL. KL. KL. KL. KL. KL. KL. KL. KL. KL. KL. KL. KL.
Score: 8.052709336750823
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Let $V_* : \mathbb{R}^d \to \mathbb{R}$ be some (possibly non-convex) potential function, and consider the probability measure $\pi \propto e^{-V_*}$. When $\pi$ exhibits multiple modes, it is known that sampling techniques based on Wasserstein gradient flows of the Kullback-Leibler (KL) divergence (e.g. Langevin Monte Carlo) suffer poorly in the rate of convergence, where the dynamics are unable to easily traverse between modes. In stark contrast, the work of Lu et al. (2019; 2022) has shown that the gradient flow of the KL with respect to the Fisher-Rao (FR) geometry exhibits a convergence rate to $\pi$ is that \textit{independent} of the potential function. In this short note, we complement these existing results in the literature by providing an explicit expansion of $\text{KL}(\rho_t^{\text{FR}}\|\pi)$ in terms of $e^{-t}$, where $(\rho_t^{\text{FR}})_{t\geq 0}$ is the FR gradient flow of the KL divergence. In turn, we are able to provide a clean asymptotic convergence rate, where the burn-in time is guaranteed to be finite. Our proof is based on observing a similarity between FR gradient flows and simulated annealing with linear scaling, and facts about cumulant generating functions. We conclude with simple synthetic experiments that demonstrate our theoretical findings are indeed tight. Based on our numerics, we conjecture that the asymptotic rates of convergence for Wasserstein-Fisher-Rao gradient flows are possibly related to this expansion in some cases.

Related papers

Mesoscopic Fluctuations and Multifractality at and across Measurement-Induced Phase Transition [46.176861415532095]
We explore statistical fluctuations over the ensemble of quantum trajectories in a model of two-dimensional free fermions.<n>Our results exhibit a remarkable analogy to Anderson localization, with $G_AB$ corresponding to two-terminal conductance.<n>Our findings lay the groundwork for mesoscopic theory of monitored systems, paving the way for various extensions.
arXiv Detail & Related papers (2025-07-15T13:44:14Z)
Sequential Monte Carlo approximations of Wasserstein--Fisher--Rao gradient flows [0.0]
We consider several partial differential equations whose solution is a minimiser of the Kullback--Leibler divergence from $pi$.<n>We propose a novel algorithm to approximate the Wasserstein--Fisher--Rao flow of the Kullback--Leibler divergence.
arXiv Detail & Related papers (2025-06-06T09:24:46Z)
Hellinger-Kantorovich Gradient Flows: Global Exponential Decay of Entropy Functionals [52.154685604660465]
We investigate a family of gradient flows of positive and probability measures, focusing on the Hellinger-Kantorovich (HK) geometry. A central contribution is a complete characterization of global exponential decay behaviors of entropy functionals under Otto-Wasserstein and Hellinger-type gradient flows.
arXiv Detail & Related papers (2025-01-28T16:17:09Z)
A Unified Analysis for Finite Weight Averaging [50.75116992029417]
Averaging iterations of Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA) In this paper, we generalize LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.
arXiv Detail & Related papers (2024-11-20T10:08:22Z)
von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions. The resulting probability model has connections with continuous spin models in statistical physics. For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z)
Iterated Schrödinger bridge approximation to Wasserstein Gradient Flows [1.5561923713703105]
We introduce a novel discretization scheme for Wasserstein gradient flows that involves successively computing Schr"odinger bridges with the same marginals. The proposed scheme has two advantages: one, it avoids the use of the score function, and, two, it is amenable to particle-based approximations using the Sinkhorn algorithm.
arXiv Detail & Related papers (2024-06-16T07:23:26Z)
Exact dynamics of quantum dissipative $XX$ models: Wannier-Stark localization in the fragmented operator space [49.1574468325115]
We find an exceptional point at a critical dissipation strength that separates oscillating and non-oscillating decay. We also describe a different type of dissipation that leads to a single decay mode in the whole operator subspace.
arXiv Detail & Related papers (2024-05-27T16:11:39Z)
Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in Reproducing Kernel Hilbert Spaces [1.3654846342364308]
We regularize the $f$-divergence by a squared maximum mean discrepancy associated with a characteristic kernel $K$. We exploit well-known results on envelopes in Hilbert spaces to prove properties of the MMD-regularized $f$-divergences. We provide proof-of-the-concept numerical examples for $f$-divergences with both infinite and finite recession constant.
arXiv Detail & Related papers (2024-02-07T06:30:39Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron [49.45105570960104]
We prove the global convergence of randomly gradient descent with a $Oleft(T-3right)$ rate. These two bounds jointly give an exact characterization of the convergence rate. We show this potential function converges slowly, which implies the slow convergence rate of the loss function.
arXiv Detail & Related papers (2023-02-20T15:33:26Z)
Unadjusted Langevin algorithm for sampling a mixture of weakly smooth potentials [0.0]
We prove convergence guarantees under Poincar'e inequality or non-strongly convex outside the ball. We also provide convergence in $L_beta$-Wasserstein metric for the smoothing potential.
arXiv Detail & Related papers (2021-12-17T04:10:09Z)
Federated Functional Gradient Boosting [75.06942944563572]
We study functional minimization in Federated Learning. For both FFGB.C and FFGB.L, the radii of convergence shrink to zero as the feature distributions become more homogeneous.
arXiv Detail & Related papers (2021-03-11T21:49:19Z)
Simulated annealing from continuum to discretization: a convergence analysis via the Eyring--Kramers law [10.406659081400354]
We study the convergence rate of continuous-time simulated annealing $(X_t;, t ge 0)$ and its discretization $(x_k;, k =0,1, ldots)$ We prove that the tail probability $mathbbP(f(X_t) > min f +delta)$ (resp. $mathP(f(x_k) > min f +delta)$) decays in time (resp. in cumulative step size)
arXiv Detail & Related papers (2021-02-03T23:45:39Z)
On the Convergence of Gradient Descent in GANs: MMD GAN As a Gradient Flow [26.725412498545385]
We show that a parametric kernelized gradient flow mimics the min-max game in gradient regularized $mathrmMMD$ GAN. We then derive an explicit condition which ensures that gradient descent on the space of the generator in regularized $mathrmMMD$ GAN is globally convergent to the target distribution.
arXiv Detail & Related papers (2020-11-04T16:55:00Z)
Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z)
The Convergence Indicator: Improved and completely characterized parameter bounds for actual convergence of Particle Swarm Optimization [68.8204255655161]
We introduce a new convergence indicator that can be used to calculate whether the particles will finally converge to a single point or diverge. Using this convergence indicator we provide the actual bounds completely characterizing parameter regions that lead to a converging swarm.
arXiv Detail & Related papers (2020-06-06T19:08:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.