Beyond EM Algorithm on Over-specified Two-Component Location-Scale
Gaussian Mixtures
- URL: http://arxiv.org/abs/2205.11078v1
- Date: Mon, 23 May 2022 06:49:55 GMT
- Title: Beyond EM Algorithm on Over-specified Two-Component Location-Scale
Gaussian Mixtures
- Authors: Tongzheng Ren and Fuheng Cui and Sujay Sanghavi and Nhat Ho
- Abstract summary: We develop the Exponential Location Update (ELU) algorithm to efficiently explore the curvature of the negative log-likelihood functions.
We demonstrate that the ELU algorithm converges to the final statistical radius of the models after a logarithmic number of iterations.
- Score: 29.26015093627193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Expectation-Maximization (EM) algorithm has been predominantly used to
approximate the maximum likelihood estimation of the location-scale Gaussian
mixtures. However, when the models are over-specified, namely, the chosen
number of components to fit the data is larger than the unknown true number of
components, EM needs a polynomial number of iterations in terms of the sample
size to reach the final statistical radius; this is computationally expensive
in practice. The slow convergence of EM is due to the missing of the locally
strong convexity with respect to the location parameter on the negative
population log-likelihood function, i.e., the limit of the negative sample
log-likelihood function when the sample size goes to infinity. To efficiently
explore the curvature of the negative log-likelihood functions, by specifically
considering two-component location-scale Gaussian mixtures, we develop the
Exponential Location Update (ELU) algorithm. The idea of the ELU algorithm is
that we first obtain the exact optimal solution for the scale parameter and
then perform an exponential step-size gradient descent for the location
parameter. We demonstrate theoretically and empirically that the ELU iterates
converge to the final statistical radius of the models after a logarithmic
number of iterations. To the best of our knowledge, it resolves the
long-standing open question in the literature about developing an optimization
algorithm that has optimal statistical and computational complexities for
solving parameter estimation even under some specific settings of the
over-specified Gaussian mixture models.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization [38.67914746910537]
We prove an $mathcalO(t-1)$ lower bound rate for the OT map, using the similarity between Laguerre cells estimation and density support estimation.
To nearly achieve the desired fast rate, we design an entropic regularization scheme decreasing with the number of samples.
arXiv Detail & Related papers (2024-05-23T11:46:03Z) - A Fourier Approach to the Parameter Estimation Problem for One-dimensional Gaussian Mixture Models [21.436254507839738]
We propose a novel algorithm for estimating parameters in one-dimensional Gaussian mixture models.
We show that our algorithm achieves better scores in likelihood, AIC, and BIC when compared to the EM algorithm.
arXiv Detail & Related papers (2024-04-19T03:53:50Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Interacting Particle Langevin Algorithm for Maximum Marginal Likelihood
Estimation [2.53740603524637]
We develop a class of interacting particle systems for implementing a maximum marginal likelihood estimation procedure.
In particular, we prove that the parameter marginal of the stationary measure of this diffusion has the form of a Gibbs measure.
Using a particular rescaling, we then prove geometric ergodicity of this system and bound the discretisation error.
in a manner that is uniform in time and does not increase with the number of particles.
arXiv Detail & Related papers (2023-03-23T16:50:08Z) - Gaussian process regression and conditional Karhunen-Lo\'{e}ve models
for data assimilation in inverse problems [68.8204255655161]
We present a model inversion algorithm, CKLEMAP, for data assimilation and parameter estimation in partial differential equation models.
The CKLEMAP method provides better scalability compared to the standard MAP method.
arXiv Detail & Related papers (2023-01-26T18:14:12Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Improved Convergence Guarantees for Learning Gaussian Mixture Models by
EM and Gradient EM [15.251738476719918]
We consider the problem of estimating the parameters a Gaussian Mixture Model with K components of known weights.
We present a sharper analysis of the local convergence of EM and gradient EM, compared to previous works.
Our second contribution are improved sample size requirements for accurate estimation by EM and gradient EM.
arXiv Detail & Related papers (2021-01-03T08:10:01Z) - Sparse Representations of Positive Functions via First and Second-Order
Pseudo-Mirror Descent [15.340540198612823]
We consider expected risk problems when the range of the estimator is required to be nonnegative.
We develop first and second-order variants of approximation mirror descent employing emphpseudo-gradients.
Experiments demonstrate favorable performance on ingeneous Process intensity estimation in practice.
arXiv Detail & Related papers (2020-11-13T21:54:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.