Forward-backward Gaussian variational inference via JKO in the
Bures-Wasserstein Space
- URL: http://arxiv.org/abs/2304.05398v1
- Date: Mon, 10 Apr 2023 19:49:50 GMT
- Title: Forward-backward Gaussian variational inference via JKO in the
Bures-Wasserstein Space
- Authors: Michael Diao, Krishnakumar Balasubramanian, Sinho Chewi, Adil Salim
- Abstract summary: Variational inference (VI) seeks to approximate a target distribution $pi$ by an element of a tractable family of distributions.
We develop the Forward-Backward Gaussian Variational Inference (FB-GVI) algorithm to solve Gaussian VI.
For our proposed algorithm, we obtain state-of-the-art convergence guarantees when $pi$ is log-smooth and log-concave.
- Score: 19.19325201882727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variational inference (VI) seeks to approximate a target distribution $\pi$
by an element of a tractable family of distributions. Of key interest in
statistics and machine learning is Gaussian VI, which approximates $\pi$ by
minimizing the Kullback-Leibler (KL) divergence to $\pi$ over the space of
Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian
Variational Inference (FB-GVI) algorithm to solve Gaussian VI. Our approach
exploits the composite structure of the KL divergence, which can be written as
the sum of a smooth term (the potential) and a non-smooth term (the entropy)
over the Bures-Wasserstein (BW) space of Gaussians endowed with the Wasserstein
distance. For our proposed algorithm, we obtain state-of-the-art convergence
guarantees when $\pi$ is log-smooth and log-concave, as well as the first
convergence guarantees to first-order stationary solutions when $\pi$ is only
log-smooth.
Related papers
- Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians [27.20127082606962]
Variational inference (VI) is a popular approach in Bayesian inference.
This work aims to contribute to the theoretical study of VI in the non-Gaussian case.
arXiv Detail & Related papers (2024-06-06T12:38:59Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Gaussian-Smoothed Sliced Probability Divergences [15.123608776470077]
We show that smoothing and slicing preserve the metric property and the weak topology.
We also derive other properties, including continuity, of different divergences with respect to the smoothing parameter.
arXiv Detail & Related papers (2024-04-04T07:55:46Z) - Closed-form Filtering for Non-linear Systems [83.91296397912218]
We propose a new class of filters based on Gaussian PSD Models, which offer several advantages in terms of density approximation and computational efficiency.
We show that filtering can be efficiently performed in closed form when transitions and observations are Gaussian PSD Models.
Our proposed estimator enjoys strong theoretical guarantees, with estimation error that depends on the quality of the approximation and is adaptive to the regularity of the transition probabilities.
arXiv Detail & Related papers (2024-02-15T08:51:49Z) - Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy [23.12198546384976]
Posterior sampling provides $varepsilon$-pure differential privacy guarantees.
It does not suffer from potentially unbounded privacy breach introduced by $(varepsilon,delta)$-approximate DP.
In practice, however, one needs to apply approximate sampling methods such as Markov chain Monte Carlo.
arXiv Detail & Related papers (2023-10-23T07:54:39Z) - Gaussian random field approximation via Stein's method with applications to wide random neural networks [20.554836643156726]
We develop a novel Gaussian smoothing technique that allows us to transfer a bound in a smoother metric to the $W_$ distance.
We obtain the first bounds on the Gaussian random field approximation of wide random neural networks.
Our bounds are explicitly expressed in terms of the widths of the network and moments of the random weights.
arXiv Detail & Related papers (2023-06-28T15:35:10Z) - Non-Gaussian Component Analysis via Lattice Basis Reduction [56.98280399449707]
Non-Gaussian Component Analysis (NGCA) is a distribution learning problem.
We provide an efficient algorithm for NGCA in the regime that $A$ is discrete or nearly discrete.
arXiv Detail & Related papers (2021-12-16T18:38:02Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence [8.873449722727026]
We show that the rate estimate $widetildemathcalO(depsilon-1)$ improves the previously known rates in both of these metrics.
In particular, for convex and firstorder smooth potentials, we show that LMC algorithm achieves the rate estimate $widetildemathcalO(depsilon-1)$ which improves the previously known rates in both of these metrics.
arXiv Detail & Related papers (2020-07-22T18:18:28Z) - Debiased Sinkhorn barycenters [110.79706180350507]
Entropy regularization in optimal transport (OT) has been the driver of many recent interests for Wasserstein metrics and barycenters in machine learning.
We show how this bias is tightly linked to the reference measure that defines the entropy regularizer.
We propose debiased Wasserstein barycenters that preserve the best of both worlds: fast Sinkhorn-like iterations without entropy smoothing.
arXiv Detail & Related papers (2020-06-03T23:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.