Related papers: Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space

Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space

URL: http://arxiv.org/abs/2602.18718v1
Date: Sat, 21 Feb 2026 04:52:53 GMT
Title: Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space
Authors: Kyurae Kim, Qiang Fu, Yi-An Ma, Jacob R. Gardner, Trevor Campbell,
Abstract summary: Wasserstein VI (WVI) and black-box VI (BBVI) perform gradient descent in measure space.<n>We identify that WVI's superiority stems from the specific gradient estimator it uses.<n>We demonstrate that the use of Price's gradient is the major source of performance improvement.
Score: 39.80428405222528
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For approximating a target distribution given only its unnormalized log-density, stochastic gradient-based variational inference (VI) algorithms are a popular approach. For example, Wasserstein VI (WVI) and black-box VI (BBVI) perform gradient descent in measure space (Bures-Wasserstein space) and parameter space, respectively. Previously, for the Gaussian variational family, convergence guarantees for WVI have shown superiority over existing results for black-box VI with the reparametrization gradient, suggesting the measure space approach might provide some unique benefits. In this work, however, we close this gap by obtaining identical state-of-the-art iteration complexity guarantees for both. In particular, we identify that WVI's superiority stems from the specific gradient estimator it uses, which BBVI can also leverage with minor modifications. The estimator in question is usually associated with Price's theorem and utilizes second-order information (Hessians) of the target log-density. We will refer to this as Price's gradient. On the flip side, WVI can be made more widely applicable by using the reparametrization gradient, which requires only gradients of the log-density. We empirically demonstrate that the use of Price's gradient is the major source of performance improvement.

Related papers

Gradient-based optimization for variational empirical Bayes multiple regression [2.6763498831034043]
We propose alternative optimization approaches based on gradient-based (quasi-Newton) methods. We show that GradVI produces similar predictive performance and converges in fewer iterations when predictors are highly correlated. Our methods are implemented in an open-source Python software, GradVI.
arXiv Detail & Related papers (2024-11-21T20:35:44Z)
Bridging the Gap Between Variational Inference and Wasserstein Gradient Flows [6.452626686361619]
We bridge the gap between variational inference and Wasserstein gradient flows. Under certain conditions, the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow. We also offer an alternative perspective on the path-derivative gradient, framing it as a distillation procedure to the Wasserstein gradient flow.
arXiv Detail & Related papers (2023-10-31T00:10:19Z)
Neural Gradient Learning and Optimization for Oriented Point Normal Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation. We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors. Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z)
Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing? [14.2377621491791]
Black-box variational inference converges at a geometric (traditionally called "linear") rate under perfect variational family specification. We also improve existing analysis on the regular closed-form entropy gradient estimators.
arXiv Detail & Related papers (2023-07-27T06:32:43Z)
On the Convergence of Black-Box Variational Inference [16.895490556279647]
We provide the first convergence guarantee for full black-box variational inference (BBVI) Our results hold for log-smooth posterior densities with and without strong log-concavity and the location-scale variational family.
arXiv Detail & Related papers (2023-05-24T16:59:50Z)
Practical and Matching Gradient Variance Bounds for Black-Box Variational Bayesian Inference [8.934639058735812]
We show that BBVI satisfies a matching bound corresponding to the $ABC$ condition used in the gradient descent literature. We also show that the variance of the mean-field parameterization has provably superior dimensional dependence.
arXiv Detail & Related papers (2023-03-18T19:07:14Z)
GradViT: Gradient Inversion of Vision Transformers [83.54779732309653]
We demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks. We introduce a method, named GradViT, that optimize random noise into naturally looking images. We observe unprecedentedly high fidelity and closeness to the original (hidden) data.
arXiv Detail & Related papers (2022-03-22T17:06:07Z)
GBHT: Gradient Boosting Histogram Transform for Density Estimation [73.94900378709023]
We propose a density estimation algorithm called textitGradient Boosting Histogram Transform (GBHT) We make the first attempt to theoretically explain why boosting can enhance the performance of its base learners for density estimation problems.
arXiv Detail & Related papers (2021-06-10T13:40:28Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods. Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.