Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality
- URL: http://arxiv.org/abs/2407.17949v1
- Date: Thu, 25 Jul 2024 11:08:53 GMT
- Title: Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality
- Authors: Rocco Caprio, Adam M Johansen,
- Abstract summary: We extend an analysis technique commonly employed to understand alternating minimization algorithms on Euclidean space to the Expectation Maximization (EM) algorithm.
We obtain finite sample error bounds and exponential convergence of the EM algorithm under a natural generalisation of a log-Sobolev inequality.
- Score: 1.1510009152620668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: By utilizing recently developed tools for constructing gradient flows on Wasserstein spaces, we extend an analysis technique commonly employed to understand alternating minimization algorithms on Euclidean space to the Expectation Maximization (EM) algorithm via its representation as coordinate-wise minimization on the product of a Euclidean space and a space of probability distributions due to Neal and Hinton (1998). In so doing we obtain finite sample error bounds and exponential convergence of the EM algorithm under a natural generalisation of a log-Sobolev inequality. We further demonstrate that the analysis technique is sufficiently flexible to allow also the analysis of several variants of the EM algorithm.
Related papers
- Learning Overspecified Gaussian Mixtures Exponentially Fast with the EM Algorithm [5.625796693054093]
We investigate the convergence properties of the EM algorithm when applied to overspecified Gaussian mixture models.<n>We demonstrate that the population EM algorithm converges exponentially fast in terms of the Kullback-Leibler (KL) distance.
arXiv Detail & Related papers (2025-06-13T14:57:57Z) - Convergence Analysis of the Wasserstein Proximal Algorithm beyond Geodesic Convexity [13.468026138183623]
The proximal descent algorithm is a powerful tool to nonlinear and nonsmooths in a general metric space.
We show a faster convergence rate than the Euclidean algorithm on meanfield neural networks.
arXiv Detail & Related papers (2025-01-25T00:00:50Z) - Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces [6.992239210938067]
We consider a minimax problem motivated by distributionally robust optimization (DRO) when the worst-case distribution is continuous.
Recent research has explored learning the worst-case distribution using neural network-based generative networks.
This paper bridges this theoretical challenge by presenting an iterative algorithm to solve such a minimax problem.
arXiv Detail & Related papers (2024-12-29T19:31:23Z) - A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal.
Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z) - Taming the Interacting Particle Langevin Algorithm -- the superlinear case [0.0]
We develop a new class of stable, under such non-linearities, algorithms called tamed interacting particle Langevin algorithms (tIPLA)
We obtain non-asymptotic convergence error estimates in Wasserstein-2 distance for the new class under an optimal rate.
arXiv Detail & Related papers (2024-03-28T17:11:25Z) - A Note on High-Probability Analysis of Algorithms with Exponential,
Sub-Gaussian, and General Light Tails [34.465259431606405]
We show that the analysis of such an algorithm can be reduced, in a black-box manner, and with only a small loss in logarithmic factors.
This approach simultaneously applies to any light-tailed randomization, including exponential, sub-Gaussian, and more general fast-decaying distributions.
arXiv Detail & Related papers (2024-03-05T11:38:20Z) - Convergence of Expectation-Maximization Algorithm with Mixed-Integer Optimization [5.319361976450982]
This paper introduces a set of conditions that ensure the convergence of a specific class of EM algorithms.
Our results offer a new analysis technique for iterative algorithms that solve mixed-integer non-linear optimization problems.
arXiv Detail & Related papers (2024-01-31T11:42:46Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Robust empirical risk minimization via Newton's method [9.797319790710711]
A new variant of Newton's method for empirical risk minimization is studied.
The gradient and Hessian of the objective function are replaced by robust estimators.
An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed.
arXiv Detail & Related papers (2023-01-30T18:54:54Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - On Asymptotic Linear Convergence of Projected Gradient Descent for
Constrained Least Squares [22.851500417035947]
This manuscript presents a unified framework for the analysis of projected gradient descent in the context of constrained least squares.
We present a recipe for the convergence analysis of PGD and demonstrate it via a beginning-to-end application of the recipe on four fundamental problems.
arXiv Detail & Related papers (2021-12-22T09:49:51Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Analysis of Truncated Orthogonal Iteration for Sparse Eigenvector
Problems [78.95866278697777]
We propose two variants of the Truncated Orthogonal Iteration to compute multiple leading eigenvectors with sparsity constraints simultaneously.
We then apply our algorithms to solve the sparse principle component analysis problem for a wide range of test datasets.
arXiv Detail & Related papers (2021-03-24T23:11:32Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.