Convergence rate of random scan Coordinate Ascent Variational Inference under log-concavity
- URL: http://arxiv.org/abs/2406.07292v2
- Date: Mon, 23 Sep 2024 16:27:50 GMT
- Title: Convergence rate of random scan Coordinate Ascent Variational Inference under log-concavity
- Authors: Hugo Lavenant, Giacomo Zanella,
- Abstract summary: The Coordinate Ascent Variational Inference scheme is a popular algorithm used to compute the mean-field approximation of a probability distribution of interest.
We analyze its random scan version, under log-concavity assumptions on the target density.
- Score: 0.18416014644193066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Coordinate Ascent Variational Inference scheme is a popular algorithm used to compute the mean-field approximation of a probability distribution of interest. We analyze its random scan version, under log-concavity assumptions on the target density. Our approach builds on the recent work of M. Arnese and D. Lacker, \emph{Convergence of coordinate ascent variational inference for log-concave measures via optimal transport} [arXiv:2404.08792] which studies the deterministic scan version of the algorithm, phrasing it as a block-coordinate descent algorithm in the space of probability distributions endowed with the geometry of optimal transport. We obtain tight rates for the random scan version, which imply that the total number of factor updates required to converge scales linearly with the condition number and the number of blocks of the target distribution. By contrast, available bounds for the deterministic scan case scale quadratically in the same quantities, which is analogue to what happens for optimization of convex functions in Euclidean spaces.
Related papers
- Canonical Variates in Wasserstein Metric Space [16.668946904062032]
We employ the Wasserstein metric to measure distances between distributions, which are then used by distance-based classification algorithms.
Central to our investigation is dimension reduction within the Wasserstein metric space to enhance classification accuracy.
We introduce a novel approach grounded in the principle of maximizing Fisher's ratio, defined as the quotient of between-class variation to within-class variation.
arXiv Detail & Related papers (2024-05-24T17:59:21Z) - Analytical Approximation of the ELBO Gradient in the Context of the Clutter Problem [0.0]
We propose an analytical solution for approximating the gradient of the Evidence Lower Bound (ELBO) in variational inference problems.
The proposed method demonstrates good accuracy and rate of convergence together with linear computational complexity.
arXiv Detail & Related papers (2024-04-16T13:19:46Z) - Convergence of coordinate ascent variational inference for log-concave measures via optimal transport [0.0]
Mean field inference (VI) is the problem of finding the closest product (factorized) measure.
The well known Ascent Variational Inference (CAVI) aims this approximate measure by variation over one coordinate.
arXiv Detail & Related papers (2024-04-12T19:43:54Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent [7.6860514640178]
We propose a novel zeroth-order optimization algorithm for distributed reinforcement learning.
It allows each agent to estimate its local gradient by cost evaluation independently, without use of any consensus protocol.
arXiv Detail & Related papers (2021-07-26T18:11:07Z) - Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization [106.70006655990176]
A distributional optimization problem arises widely in machine learning and statistics.
We propose a novel particle-based algorithm, dubbed as variational transport, which approximately performs Wasserstein gradient descent.
We prove that when the objective function satisfies a functional version of the Polyak-Lojasiewicz (PL) (Polyak, 1963) and smoothness conditions, variational transport converges linearly.
arXiv Detail & Related papers (2020-12-21T18:33:13Z) - Faster Wasserstein Distance Estimation with the Sinkhorn Divergence [0.0]
The squared Wasserstein distance is a quantity to compare probability distributions in a non-parametric setting.
In this work, we propose instead to estimate it with the Sinkhorn divergence.
We show that, for smooth densities, this estimator has a comparable sample complexity but allows higher regularization levels.
arXiv Detail & Related papers (2020-06-15T06:58:16Z) - Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data.
We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem.
In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.