A Finite-Particle Convergence Rate for Stein Variational Gradient
Descent
- URL: http://arxiv.org/abs/2211.09721v5
- Date: Thu, 2 Nov 2023 01:14:17 GMT
- Title: A Finite-Particle Convergence Rate for Stein Variational Gradient
Descent
- Authors: Jiaxin Shi and Lester Mackey
- Abstract summary: We provide the first finite-particle convergence rate for Stein variational descent gradient (SVGD)
Our explicit, non-asymptotic proof strategy will serve as a template for future refinements.
- Score: 47.6818454221125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide the first finite-particle convergence rate for Stein variational
gradient descent (SVGD), a popular algorithm for approximating a probability
distribution with a collection of particles. Specifically, whenever the target
distribution is sub-Gaussian with a Lipschitz score, SVGD with n particles and
an appropriate step size sequence drives the kernel Stein discrepancy to zero
at an order 1/sqrt(log log n) rate. We suspect that the dependence on n can be
improved, and we hope that our explicit, non-asymptotic proof strategy will
serve as a template for future refinements.
Related papers
- Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent [14.890609936348277]
We provide finite-particle convergence rates for the Stein Variational Gradient Descent algorithm in the Kernelized Stein Discrepancy ($mathsfKSD$) and Wasserstein-2 metrics.
Our key insight is that the time derivative of the relative entropy between the joint density of $N$ particle locations splits into a dominant negative part' proportional to $N$ times the expected $mathsfKSD2$ and a smaller positive part'
arXiv Detail & Related papers (2024-09-13T01:49:19Z) - Stein transport for Bayesian inference [3.009591302286514]
We introduce $textitStein transport$, a novel methodology for Bayesian inference designed to efficiently push an ensemble of particles along a curve of tempered probability distributions.
The driving vector field is chosen from a reproducing kernel Hilbert space and can be derived either through a suitable kernel ridge regression formulation or as an infinitesimal optimal transport map in the Stein geometry.
We show that in comparison to SVGD, Stein transport not only often reaches more accurate posterior approximations with a significantly reduced computational budget, but that it also effectively mitigates the variance collapse phenomenon commonly observed in SVGD.
arXiv Detail & Related papers (2024-09-02T21:03:38Z) - Augmented Message Passing Stein Variational Gradient Descent [3.5788754401889014]
We study the isotropy property of finite particles during the convergence process.
All particles tend to cluster around the particle center within a certain range.
Our algorithm achieves satisfactory accuracy and overcomes the variance collapse problem in various benchmark problems.
arXiv Detail & Related papers (2023-05-18T01:13:04Z) - High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad
Stepsize [55.0090961425708]
We propose a new, simplified high probability analysis of AdaGrad for smooth, non- probability problems.
We present our analysis in a modular way and obtain a complementary $mathcal O (1 / TT)$ convergence rate in the deterministic setting.
To the best of our knowledge, this is the first high probability for AdaGrad with a truly adaptive scheme, i.e., completely oblivious to the knowledge of smoothness.
arXiv Detail & Related papers (2022-04-06T13:50:33Z) - Large-Scale Wasserstein Gradient Flows [84.73670288608025]
We introduce a scalable scheme to approximate Wasserstein gradient flows.
Our approach relies on input neural networks (ICNNs) to discretize the JKO steps.
As a result, we can sample from the measure at each step of the gradient diffusion and compute its density.
arXiv Detail & Related papers (2021-06-01T19:21:48Z) - Kernel Stein Discrepancy Descent [16.47373844775953]
Kernel Stein Discrepancy (KSD) has received much interest recently.
We investigate the properties of its Wasserstein gradient flow to approximate a target probability distribution $pi$ on $mathbbRd$.
This leads to a straightforwardly implementable, deterministic score-based method to sample from $pi$, named KSD Descent.
arXiv Detail & Related papers (2021-05-20T19:05:23Z) - Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization [106.70006655990176]
A distributional optimization problem arises widely in machine learning and statistics.
We propose a novel particle-based algorithm, dubbed as variational transport, which approximately performs Wasserstein gradient descent.
We prove that when the objective function satisfies a functional version of the Polyak-Lojasiewicz (PL) (Polyak, 1963) and smoothness conditions, variational transport converges linearly.
arXiv Detail & Related papers (2020-12-21T18:33:13Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - A Non-Asymptotic Analysis for Stein Variational Gradient Descent [44.30569261307296]
We provide a novel finite time analysis for the Stein Variational Gradient Descent algorithm.
We provide a descent lemma establishing that the algorithm decreases the objective at each iteration.
We also provide a convergence result of the finite particle system corresponding to the practical implementation of SVGD to its population version.
arXiv Detail & Related papers (2020-06-17T12:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.