Markov Chain Score Ascent: A Unifying Framework of Variational Inference
with Markovian Gradients
- URL: http://arxiv.org/abs/2206.06295v1
- Date: Mon, 13 Jun 2022 16:25:08 GMT
- Title: Markov Chain Score Ascent: A Unifying Framework of Variational Inference
with Markovian Gradients
- Authors: Kyurae Kim, Jisu Oh, Jacob R. Gardner, Adji Bousso Dieng, Hongseok Kim
- Abstract summary: Minimizing the inclusive Kullback-Leibler divergence with gradient descent (SGD) is challenging since its gradient is defined as an integral over the posterior.
This paper provides the first non-asymptotic convergence analysis of these methods by establishing their mixing rate and gradient variance.
- Score: 7.461223697336269
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Minimizing the inclusive Kullback-Leibler (KL) divergence with stochastic
gradient descent (SGD) is challenging since its gradient is defined as an
integral over the posterior. Recently, multiple methods have been proposed to
run SGD with biased gradient estimates obtained from a Markov chain. This paper
provides the first non-asymptotic convergence analysis of these methods by
establishing their mixing rate and gradient variance. To do this, we
demonstrate that these methods-which we collectively refer to as Markov chain
score ascent (MCSA) methods-can be cast as special cases of the Markov chain
gradient descent framework. Furthermore, by leveraging this new understanding,
we develop a novel MCSA scheme, parallel MCSA (pMCSA), that achieves a tighter
bound on the gradient variance. We demonstrate that this improved theoretical
result translates to superior empirical performance.
Related papers
- A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Robust Approximate Sampling via Stochastic Gradient Barker Dynamics [0.0]
We introduce the Barker gradient dynamics (SGBD) algorithm, a robust alternative to Langevin-based sampling algorithms, to the gradient framework.
We characterize the impact of gradients on the Barker transition mechanism and develop a bias-corrected version that eliminates the error due to the gradient noise in the proposal.
arXiv Detail & Related papers (2024-05-14T23:47:02Z) - Stochastic Gradient Descent under Markovian Sampling Schemes [3.04585143845864]
We study a variation of vanilla gradient descent where the only has access to a Markovian sampling scheme.
We focus on obtaining rates of convergence under the least restrictive assumptions possible on the underlying Markov chain.
arXiv Detail & Related papers (2023-02-28T09:18:00Z) - Almost Sure Saddle Avoidance of Stochastic Gradient Methods without the
Bounded Gradient Assumption [11.367487348673793]
We prove that various gradient descent methods, including the gradient descent (SGD), heavy-ball (SHB) and Nesterov's accelerated gradient (SNAG) methods, almost surely avoid any strict saddle manifold.
This is the first time such results are obtained for SHB and SNAG methods.
arXiv Detail & Related papers (2023-02-15T18:53:41Z) - Learn Quasi-stationary Distributions of Finite State Markov Chain [2.780408966503282]
We propose a reinforcement learning (RL) approach to compute the expression of quasi-stationary distribution.
We minimize the KL-divergence of two Markovian path distributions induced by the candidate distribution and the true target distribution.
We derive the corresponding policy gradient theorem and design an actor-critic algorithm to learn the optimal solution and value function.
arXiv Detail & Related papers (2021-11-19T02:56:34Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Sampling in Combinatorial Spaces with SurVAE Flow Augmented MCMC [83.48593305367523]
Hybrid Monte Carlo is a powerful Markov Chain Monte Carlo method for sampling from complex continuous distributions.
We introduce a new approach based on augmenting Monte Carlo methods with SurVAE Flows to sample from discrete distributions.
We demonstrate the efficacy of our algorithm on a range of examples from statistics, computational physics and machine learning, and observe improvements compared to alternative algorithms.
arXiv Detail & Related papers (2021-02-04T02:21:08Z) - Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial
Attacks [86.88061841975482]
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle.
We use this setting to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method(FGSM)
We show that the method uses fewer queries and achieves higher attack success rates than the current state of the art.
arXiv Detail & Related papers (2020-10-08T18:36:51Z) - Finite-Time Analysis of Stochastic Gradient Descent under Markov
Randomness [27.027583559295365]
gradient descent (SGD) is used in reinforcement learning and machine learning.
We show that SGD converges nearly at the same rate with Markovian gradient samples as with independent gradient samples.
arXiv Detail & Related papers (2020-03-24T17:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.