Related papers: On the Convergence of Black-Box Variational Inference

On the Convergence of Black-Box Variational Inference

URL: http://arxiv.org/abs/2305.15349v4
Date: Thu, 11 Jan 2024 04:27:46 GMT
Title: On the Convergence of Black-Box Variational Inference
Authors: Kyurae Kim, Jisu Oh, Kaiwen Wu, Yi-An Ma, Jacob R. Gardner
Abstract summary: We provide the first convergence guarantee for full black-box variational inference (BBVI) Our results hold for log-smooth posterior densities with and without strong log-concavity and the location-scale variational family.
Score: 16.895490556279647
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We provide the first convergence guarantee for full black-box variational inference (BBVI), also known as Monte Carlo variational inference. While preliminary investigations worked on simplified versions of BBVI (e.g., bounded domain, bounded support, only optimizing for the scale, and such), our setup does not need any such algorithmic modifications. Our results hold for log-smooth posterior densities with and without strong log-concavity and the location-scale variational family. Also, our analysis reveals that certain algorithm design choices commonly employed in practice, particularly, nonlinear parameterizations of the scale of the variational approximation, can result in suboptimal convergence rates. Fortunately, running BBVI with proximal stochastic gradient descent fixes these limitations, and thus achieves the strongest known convergence rate guarantees. We evaluate this theoretical insight by comparing proximal SGD against other standard implementations of BBVI on large-scale Bayesian inference problems.

Related papers

Stability-based Generalization Bounds for Variational Inference [3.146069168382982]
Variational inference (VI) is widely used for approximate inference in Bayesian machine learning. This paper develops stability based generalization bounds for a class of approximate Bayesian algorithms. The new approach complements PAC-Bayes analysis and can provide tighter bounds in some cases.
arXiv Detail & Related papers (2025-02-17T22:40:26Z)
Batch and match: black-box variational inference with a score-based divergence [26.873037094654826]
We propose batch and match (BaM) as an alternative approach to blackbox variational inference (BBVI) based on a score-based divergence. We show that BaM converges in fewer evaluations than leading implementations of BBVI based on ELBO.
arXiv Detail & Related papers (2024-02-22T18:20:22Z)
Moreau Envelope ADMM for Decentralized Weakly Convex Optimization [55.2289666758254]
This paper proposes a proximal variant of the alternating direction method of multipliers (ADMM) for distributed optimization. The results of our numerical experiments indicate that our method is faster and more robust than widely-used approaches.
arXiv Detail & Related papers (2023-08-31T14:16:30Z)
Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing? [14.2377621491791]
Black-box variational inference converges at a geometric (traditionally called "linear") rate under perfect variational family specification. We also improve existing analysis on the regular closed-form entropy gradient estimators.
arXiv Detail & Related papers (2023-07-27T06:32:43Z)
Provable convergence guarantees for black-box variational inference [19.421222110188605]
Black-box variational inference is widely used in situations where there is no proof that its optimization succeeds. We provide rigorous guarantees that methods similar to those used in practice converge on realistic inference problems.
arXiv Detail & Related papers (2023-06-04T11:31:41Z)
Practical and Matching Gradient Variance Bounds for Black-Box Variational Bayesian Inference [8.934639058735812]
We show that BBVI satisfies a matching bound corresponding to the $ABC$ condition used in the gradient descent literature. We also show that the variance of the mean-field parameterization has provably superior dimensional dependence.
arXiv Detail & Related papers (2023-03-18T19:07:14Z)
Quasi Black-Box Variational Inference with Natural Gradients for Bayesian Learning [84.90242084523565]
We develop an optimization algorithm suitable for Bayesian learning in complex models. Our approach relies on natural gradient updates within a general black-box framework for efficient training with limited model-specific derivations.
arXiv Detail & Related papers (2022-05-23T18:54:27Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size. Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z)
Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z)
GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values [75.17074235764757]
We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution. GenDICE is the state-of-the-art for estimating such density ratios.
arXiv Detail & Related papers (2020-01-29T22:10:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.