Related papers: Beyond EM Algorithm on Over-specified Two-Component Location-Scale Gaussian Mixtures

Beyond EM Algorithm on Over-specified Two-Component Location-Scale Gaussian Mixtures

URL: http://arxiv.org/abs/2205.11078v1
Date: Mon, 23 May 2022 06:49:55 GMT
Title: Beyond EM Algorithm on Over-specified Two-Component Location-Scale Gaussian Mixtures
Authors: Tongzheng Ren and Fuheng Cui and Sujay Sanghavi and Nhat Ho
Abstract summary: We develop the Exponential Location Update (ELU) algorithm to efficiently explore the curvature of the negative log-likelihood functions. We demonstrate that the ELU algorithm converges to the final statistical radius of the models after a logarithmic number of iterations.
Score: 29.26015093627193
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Expectation-Maximization (EM) algorithm has been predominantly used to approximate the maximum likelihood estimation of the location-scale Gaussian mixtures. However, when the models are over-specified, namely, the chosen number of components to fit the data is larger than the unknown true number of components, EM needs a polynomial number of iterations in terms of the sample size to reach the final statistical radius; this is computationally expensive in practice. The slow convergence of EM is due to the missing of the locally strong convexity with respect to the location parameter on the negative population log-likelihood function, i.e., the limit of the negative sample log-likelihood function when the sample size goes to infinity. To efficiently explore the curvature of the negative log-likelihood functions, by specifically considering two-component location-scale Gaussian mixtures, we develop the Exponential Location Update (ELU) algorithm. The idea of the ELU algorithm is that we first obtain the exact optimal solution for the scale parameter and then perform an exponential step-size gradient descent for the location parameter. We demonstrate theoretically and empirically that the ELU iterates converge to the final statistical radius of the models after a logarithmic number of iterations. To the best of our knowledge, it resolves the long-standing open question in the literature about developing an optimization algorithm that has optimal statistical and computational complexities for solving parameter estimation even under some specific settings of the over-specified Gaussian mixture models.

Related papers

Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization [38.67914746910537]
We prove an $mathcalO(t-1)$ lower bound rate for the OT map, using the similarity between Laguerre cells estimation and density support estimation. To nearly achieve the desired fast rate, we design an entropic regularization scheme decreasing with the number of samples.
arXiv Detail & Related papers (2024-05-23T11:46:03Z)
A Fourier Approach to the Parameter Estimation Problem for One-dimensional Gaussian Mixture Models [21.436254507839738]
We propose a novel algorithm for estimating parameters in one-dimensional Gaussian mixture models. We show that our algorithm achieves better scores in likelihood, AIC, and BIC when compared to the EM algorithm.
arXiv Detail & Related papers (2024-04-19T03:53:50Z)
Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise. In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z)
Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions. Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation. In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z)
Interacting Particle Langevin Algorithm for Maximum Marginal Likelihood Estimation [2.53740603524637]
We develop a class of interacting particle systems for implementing a maximum marginal likelihood estimation procedure. In particular, we prove that the parameter marginal of the stationary measure of this diffusion has the form of a Gibbs measure. Using a particular rescaling, we then prove geometric ergodicity of this system and bound the discretisation error. in a manner that is uniform in time and does not increase with the number of particles.
arXiv Detail & Related papers (2023-03-23T16:50:08Z)
Gaussian process regression and conditional Karhunen-Lo\'{e}ve models for data assimilation in inverse problems [68.8204255655161]
We present a model inversion algorithm, CKLEMAP, for data assimilation and parameter estimation in partial differential equation models. The CKLEMAP method provides better scalability compared to the standard MAP method.
arXiv Detail & Related papers (2023-01-26T18:14:12Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
Mean-Square Analysis with An Application to Optimal Dimension Dependence of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance. Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z)
Improved Convergence Guarantees for Learning Gaussian Mixture Models by EM and Gradient EM [15.251738476719918]
We consider the problem of estimating the parameters a Gaussian Mixture Model with K components of known weights. We present a sharper analysis of the local convergence of EM and gradient EM, compared to previous works. Our second contribution are improved sample size requirements for accurate estimation by EM and gradient EM.
arXiv Detail & Related papers (2021-01-03T08:10:01Z)
Sparse Representations of Positive Functions via First and Second-Order Pseudo-Mirror Descent [15.340540198612823]
We consider expected risk problems when the range of the estimator is required to be nonnegative. We develop first and second-order variants of approximation mirror descent employing emphpseudo-gradients. Experiments demonstrate favorable performance on ingeneous Process intensity estimation in practice.
arXiv Detail & Related papers (2020-11-13T21:54:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.