On the Convergence of Coordinate Ascent Variational Inference
- URL: http://arxiv.org/abs/2306.01122v1
- Date: Thu, 1 Jun 2023 20:19:30 GMT
- Title: On the Convergence of Coordinate Ascent Variational Inference
- Authors: Anirban Bhattacharya, Debdeep Pati, Yun Yang
- Abstract summary: We consider the common coordinate ascent variational inference (CAVI) algorithm for implementing the mean-field (MF) VI.
We provide general conditions for certifying global or local exponential convergence of CAVI.
New notion of generalized correlation for characterizing the interaction between the constituting blocks in influencing the VI objective functional is introduced.
- Score: 11.166959724276337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a computational alternative to Markov chain Monte Carlo approaches,
variational inference (VI) is becoming more and more popular for approximating
intractable posterior distributions in large-scale Bayesian models due to its
comparable efficacy and superior efficiency. Several recent works provide
theoretical justifications of VI by proving its statistical optimality for
parameter estimation under various settings; meanwhile, formal analysis on the
algorithmic convergence aspects of VI is still largely lacking. In this paper,
we consider the common coordinate ascent variational inference (CAVI) algorithm
for implementing the mean-field (MF) VI towards optimizing a Kullback--Leibler
divergence objective functional over the space of all factorized distributions.
Focusing on the two-block case, we analyze the convergence of CAVI by
leveraging the extensive toolbox from functional analysis and optimization. We
provide general conditions for certifying global or local exponential
convergence of CAVI. Specifically, a new notion of generalized correlation for
characterizing the interaction between the constituting blocks in influencing
the VI objective functional is introduced, which according to the theory,
quantifies the algorithmic contraction rate of two-block CAVI. As
illustrations, we apply the developed theory to a number of examples, and
derive explicit problem-dependent upper bounds on the algorithmic contraction
rate.
Related papers
- Bridging Multicalibration and Out-of-distribution Generalization Beyond Covariate Shift [44.708914058803224]
We establish a new model-agnostic optimization framework for out-of-distribution generalization via multicalibration.
We propose MC-Pseudolabel, a post-processing algorithm to achieve both extended multicalibration and out-of-distribution generalization.
arXiv Detail & Related papers (2024-06-02T08:11:35Z) - Extending Mean-Field Variational Inference via Entropic Regularization: Theory and Computation [2.2656885622116394]
Variational inference (VI) has emerged as a popular method for approximate inference for high-dimensional Bayesian models.
We propose a novel VI method that extends the naive mean field via entropic regularization.
We show that $Xi$-variational posteriors effectively recover the true posterior dependency.
arXiv Detail & Related papers (2024-04-14T01:40:11Z) - Statistical Inference of Optimal Allocations I: Regularities and their Implications [3.904240476752459]
We first derive Hadamard differentiability of the value function through a detailed analysis of the general properties of the sorting operator.
Building on our Hadamard differentiability results, we demonstrate how the functional delta method can be used to directly derive the properties of the value function process.
arXiv Detail & Related papers (2024-03-27T04:39:13Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - The Last-Iterate Convergence Rate of Optimistic Mirror Descent in
Stochastic Variational Inequalities [29.0058976973771]
We show an intricate relation between the algorithm's rate of convergence and the local geometry induced by the method's underlying Bregman function.
We show that this exponent determines both the optimal step-size policy of the algorithm and the optimal rates attained.
arXiv Detail & Related papers (2021-07-05T09:54:47Z) - Loss function based second-order Jensen inequality and its application
to particle variational inference [112.58907653042317]
Particle variational inference (PVI) uses an ensemble of models as an empirical approximation for the posterior distribution.
PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models.
We derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models.
arXiv Detail & Related papers (2021-06-09T12:13:51Z) - Statistical optimality and stability of tangent transform algorithms in
logit models [6.9827388859232045]
We provide conditions on the data generating process to derive non-asymptotic upper bounds to the risk incurred by the logistical optima.
In particular, we establish local variation of the algorithm without any assumptions on the data-generating process.
We explore a special case involving a semi-orthogonal design under which a global convergence is obtained.
arXiv Detail & Related papers (2020-10-25T05:15:13Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Meta-Learning Divergences of Variational Inference [49.164944557174294]
Variational inference (VI) plays an essential role in approximate Bayesian inference.
We propose a meta-learning algorithm to learn the divergence metric suited for the task of interest.
We demonstrate our approach outperforms standard VI on Gaussian mixture distribution approximation.
arXiv Detail & Related papers (2020-07-06T17:43:01Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.