Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL
Divergence for Exponential Families via Mirror Descent
- URL: http://arxiv.org/abs/2011.01170v3
- Date: Mon, 28 Feb 2022 00:43:11 GMT
- Title: Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL
Divergence for Exponential Families via Mirror Descent
- Authors: Frederik Kunstner, Raunak Kumar, Mark Schmidt
- Abstract summary: We show that for common setting of exponential family distributions, viewing EM as a mirror descent algorithm leads to convergence rates in Kullback-Leibler divergence.
In contrast to previous works, the analysis is in to the choice of parametrization and holds with minimal assumptions.
- Score: 18.045545816267385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Expectation maximization (EM) is the default algorithm for fitting
probabilistic models with missing or latent variables, yet we lack a full
understanding of its non-asymptotic convergence properties. Previous works show
results along the lines of "EM converges at least as fast as gradient descent"
by assuming the conditions for the convergence of gradient descent apply to EM.
This approach is not only loose, in that it does not capture that EM can make
more progress than a gradient step, but the assumptions fail to hold for
textbook examples of EM like Gaussian mixtures. In this work we first show that
for the common setting of exponential family distributions, viewing EM as a
mirror descent algorithm leads to convergence rates in Kullback-Leibler (KL)
divergence. Then, we show how the KL divergence is related to first-order
stationarity via Bregman divergences. In contrast to previous works, the
analysis is invariant to the choice of parametrization and holds with minimal
assumptions. We also show applications of these ideas to local linear (and
superlinear) convergence rates, generalized EM, and non-exponential family
distributions.
Related papers
- Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation [59.86921150579892]
We deal with the problem of gradient estimation for differentiable relaxations of algorithms, operators, simulators, and other non-differentiable functions.
We develop variance reduction strategies for differentiable sorting and ranking, differentiable shortest-paths on graphs, differentiable rendering for pose estimation, as well as differentiable cryo-ET simulations.
arXiv Detail & Related papers (2024-10-10T17:10:00Z) - Understanding Stochastic Natural Gradient Variational Inference [12.800664845601197]
We show a global convergence rate of $mathcalO(frac1)$ implicitly a non-asymptotic convergence rate of NGVI.
The rate is no worse than descent (aka black-box variational inference) and has better constant dependency.
arXiv Detail & Related papers (2024-06-04T00:45:37Z) - Taming Nonconvex Stochastic Mirror Descent with General Bregman
Divergence [25.717501580080846]
This paper revisits the convergence of gradient Forward Mirror (SMD) in the contemporary non optimization setting.
For the training, we develop provably convergent algorithms for the problem of linear networks.
arXiv Detail & Related papers (2024-02-27T17:56:49Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Convergence and concentration properties of constant step-size SGD
through Markov chains [0.0]
We consider the optimization of a smooth and strongly convex objective using constant step-size gradient descent (SGD)
We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance.
All our results are non-asymptotic and their consequences are discussed through a few applications.
arXiv Detail & Related papers (2023-06-20T12:36:28Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Convergence Rates for the MAP of an Exponential Family and Stochastic
Mirror Descent -- an Open Problem [33.74484936098467]
We consider the problem of upper bounding the expected log-likelihood sub-optimality of the maximum likelihood estimate.
Surprisingly, we found no general solution to this problem in the literature.
arXiv Detail & Related papers (2021-11-12T17:18:06Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.