On the Convergence of Black-Box Variational Inference
- URL: http://arxiv.org/abs/2305.15349v4
- Date: Thu, 11 Jan 2024 04:27:46 GMT
- Title: On the Convergence of Black-Box Variational Inference
- Authors: Kyurae Kim, Jisu Oh, Kaiwen Wu, Yi-An Ma, Jacob R. Gardner
- Abstract summary: We provide the first convergence guarantee for full black-box variational inference (BBVI)
Our results hold for log-smooth posterior densities with and without strong log-concavity and the location-scale variational family.
- Score: 16.895490556279647
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide the first convergence guarantee for full black-box variational
inference (BBVI), also known as Monte Carlo variational inference. While
preliminary investigations worked on simplified versions of BBVI (e.g., bounded
domain, bounded support, only optimizing for the scale, and such), our setup
does not need any such algorithmic modifications. Our results hold for
log-smooth posterior densities with and without strong log-concavity and the
location-scale variational family. Also, our analysis reveals that certain
algorithm design choices commonly employed in practice, particularly, nonlinear
parameterizations of the scale of the variational approximation, can result in
suboptimal convergence rates. Fortunately, running BBVI with proximal
stochastic gradient descent fixes these limitations, and thus achieves the
strongest known convergence rate guarantees. We evaluate this theoretical
insight by comparing proximal SGD against other standard implementations of
BBVI on large-scale Bayesian inference problems.
Related papers
- Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results [60.92029979853314]
This paper investigates Gradient Normalization without (NSGDC) its gradient reduction variant (NSGDC-VR)
We present significant improvements in the theoretical results for both algorithms.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - Batch and match: black-box variational inference with a score-based divergence [26.873037094654826]
We propose batch and match (BaM) as an alternative approach to blackbox variational inference (BBVI) based on a score-based divergence.
We show that BaM converges in fewer evaluations than leading implementations of BBVI based on ELBO.
arXiv Detail & Related papers (2024-02-22T18:20:22Z) - Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing? [14.2377621491791]
Black-box variational inference converges at a geometric (traditionally called "linear") rate under perfect variational family specification.
We also improve existing analysis on the regular closed-form entropy gradient estimators.
arXiv Detail & Related papers (2023-07-27T06:32:43Z) - Provable convergence guarantees for black-box variational inference [19.421222110188605]
Black-box variational inference is widely used in situations where there is no proof that its optimization succeeds.
We provide rigorous guarantees that methods similar to those used in practice converge on realistic inference problems.
arXiv Detail & Related papers (2023-06-04T11:31:41Z) - Practical and Matching Gradient Variance Bounds for Black-Box
Variational Bayesian Inference [8.934639058735812]
We show that BBVI satisfies a matching bound corresponding to the $ABC$ condition used in the gradient descent literature.
We also show that the variance of the mean-field parameterization has provably superior dimensional dependence.
arXiv Detail & Related papers (2023-03-18T19:07:14Z) - Quasi Black-Box Variational Inference with Natural Gradients for
Bayesian Learning [84.90242084523565]
We develop an optimization algorithm suitable for Bayesian learning in complex models.
Our approach relies on natural gradient updates within a general black-box framework for efficient training with limited model-specific derivations.
arXiv Detail & Related papers (2022-05-23T18:54:27Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z) - GradientDICE: Rethinking Generalized Offline Estimation of Stationary
Values [75.17074235764757]
We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution.
GenDICE is the state-of-the-art for estimating such density ratios.
arXiv Detail & Related papers (2020-01-29T22:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.