Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent
- URL: http://arxiv.org/abs/2409.08469v2
- Date: Mon, 30 Sep 2024 01:20:13 GMT
- Title: Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent
- Authors: Krishnakumar Balasubramanian, Sayan Banerjee, Promit Ghosal,
- Abstract summary: We provide finite-particle convergence rates for the Stein Variational Gradient Descent algorithm in the Kernelized Stein Discrepancy ($mathsfKSD$) and Wasserstein-2 metrics.
Our key insight is that the time derivative of the relative entropy between the joint density of $N$ particle locations splits into a dominant negative part' proportional to $N$ times the expected $mathsfKSD2$ and a smaller positive part'
- Score: 14.890609936348277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in the Kernelized Stein Discrepancy ($\mathsf{KSD}$) and Wasserstein-2 metrics. Our key insight is that the time derivative of the relative entropy between the joint density of $N$ particle locations and the $N$-fold product target measure, starting from a regular initial distribution, splits into a dominant `negative part' proportional to $N$ times the expected $\mathsf{KSD}^2$ and a smaller `positive part'. This observation leads to $\mathsf{KSD}$ rates of order $1/\sqrt{N}$, in both continuous and discrete time, providing a near optimal (in the sense of matching the corresponding i.i.d. rates) double exponential improvement over the recent result by Shi and Mackey (2024). Under mild assumptions on the kernel and potential, these bounds also grow polynomially in the dimension $d$. By adding a bilinear component to the kernel, the above approach is used to further obtain Wasserstein-2 convergence in continuous time. For the case of `bilinear + Mat\'ern' kernels, we derive Wasserstein-2 rates that exhibit a curse-of-dimensionality similar to the i.i.d. setting. We also obtain marginal convergence and long-time propagation of chaos results for the time-averaged particle laws.
Related papers
- Optimal convergence rates in trace distance and relative entropy for the quantum central limit theorem [2.7855886538423182]
We show that for a centered $m$-mode quantum state with finite third-order moments, the trace distance between $rhoboxplus n$ and $rho_G$ decays at the optimal rate of $mathcalO(n-1/2)$.
For states with finite fourth-order moments, we prove that the relative entropy between $rhoboxplus n$ and $rho_G$ decays at the optimal rate of $mathcalO(n-1)$.
arXiv Detail & Related papers (2024-10-29T12:35:47Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - A Finite-Particle Convergence Rate for Stein Variational Gradient
Descent [47.6818454221125]
We provide the first finite-particle convergence rate for Stein variational descent gradient (SVGD)
Our explicit, non-asymptotic proof strategy will serve as a template for future refinements.
arXiv Detail & Related papers (2022-11-17T17:50:39Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Estimating 2-Sinkhorn Divergence between Gaussian Processes from
Finite-Dimensional Marginals [4.416484585765028]
We study the convergence of estimating the 2-Sinkhorn divergence between emphGaussian processes (GPs) using their finite-dimensional marginal distributions.
We show almost sure convergence of the divergence when the marginals are sampled according to some base measure.
arXiv Detail & Related papers (2021-02-05T16:17:55Z) - Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence [8.873449722727026]
We show that the rate estimate $widetildemathcalO(depsilon-1)$ improves the previously known rates in both of these metrics.
In particular, for convex and firstorder smooth potentials, we show that LMC algorithm achieves the rate estimate $widetildemathcalO(depsilon-1)$ which improves the previously known rates in both of these metrics.
arXiv Detail & Related papers (2020-07-22T18:18:28Z) - Debiased Sinkhorn barycenters [110.79706180350507]
Entropy regularization in optimal transport (OT) has been the driver of many recent interests for Wasserstein metrics and barycenters in machine learning.
We show how this bias is tightly linked to the reference measure that defines the entropy regularizer.
We propose debiased Wasserstein barycenters that preserve the best of both worlds: fast Sinkhorn-like iterations without entropy smoothing.
arXiv Detail & Related papers (2020-06-03T23:06:02Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - A diffusion approach to Stein's method on Riemannian manifolds [65.36007959755302]
We exploit the relationship between the generator of a diffusion on $mathbf M$ with target invariant measure and its characterising Stein operator.
We derive Stein factors, which bound the solution to the Stein equation and its derivatives.
We imply that the bounds for $mathbb Rm$ remain valid when $mathbf M$ is a flat manifold.
arXiv Detail & Related papers (2020-03-25T17:03:58Z) - Optimal estimation of high-dimensional location Gaussian mixtures [6.947889260024788]
We show that the minimax rate of estimating the mixing distribution in Wasserstein distance is $Theta((d/n)1/4 + n-1/(4k-2))$.
We also show that the mixture density can be estimated at the optimal parametric rate $Theta(sqrtd/n)$ in Hellinger distance.
arXiv Detail & Related papers (2020-02-14T00:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.