Related papers: Local Minima Structures in Gaussian Mixture Models

Local Minima Structures in Gaussian Mixture Models

URL: http://arxiv.org/abs/2009.13040v3
Date: Sat, 9 Mar 2024 15:27:29 GMT
Title: Local Minima Structures in Gaussian Mixture Models
Authors: Yudong Chen, Dogyoon Song, Xumei Xi and Yuqian Zhang
Abstract summary: We investigate the landscape of the negative log-likelihood function of Gaussian Mixture Models (GMMs) with a general number of population. As the number of components is not globally optimal, there can be multiple local minima that are not globally well-separated. Our results apply to where the true mixture satisfy a certain separation condition, and are valid when the number of components is valid.
Score: 14.87510193228802
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the landscape of the negative log-likelihood function of Gaussian Mixture Models (GMMs) with a general number of components in the population limit. As the objective function is non-convex, there can be multiple local minima that are not globally optimal, even for well-separated mixture models. Our study reveals that all local minima share a common structure that partially identifies the cluster centers (i.e., means of the Gaussian components) of the true location mixture. Specifically, each local minimum can be represented as a non-overlapping combination of two types of sub-configurations: fitting a single mean estimate to multiple Gaussian components or fitting multiple estimates to a single true component. These results apply to settings where the true mixture components satisfy a certain separation condition, and are valid even when the number of components is over- or under-specified. We also present a more fine-grained analysis for the setting of one-dimensional GMMs with three components, which provide sharper approximation error bounds with improved dependence on the separation.

Related papers

Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures [53.51230405648361]
We study the dynamics of gradient EM and employ tensor decomposition to characterize the geometric landscape of the likelihood loss.<n>This is the first global convergence and recovery result for EM or gradient EM beyond the special case of $m=2$.
arXiv Detail & Related papers (2025-06-06T23:32:38Z)
Robust Mixture Learning when Outliers Overwhelm Small Groups [51.49063715477438]
We study the problem of estimating the means of well-separated mixtures when an adversary may add arbitrary outliers. We propose an algorithm that obtains order-optimal error guarantees for each mixture mean with a minimal list-size overhead.
arXiv Detail & Related papers (2024-07-22T16:51:05Z)
A Fourier Approach to the Parameter Estimation Problem for One-dimensional Gaussian Mixture Models [21.436254507839738]
We propose a novel algorithm for estimating parameters in one-dimensional Gaussian mixture models. We show that our algorithm achieves better scores in likelihood, AIC, and BIC when compared to the EM algorithm.
arXiv Detail & Related papers (2024-04-19T03:53:50Z)
Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z)
Beyond EM Algorithm on Over-specified Two-Component Location-Scale Gaussian Mixtures [29.26015093627193]
We develop the Exponential Location Update (ELU) algorithm to efficiently explore the curvature of the negative log-likelihood functions. We demonstrate that the ELU algorithm converges to the final statistical radius of the models after a logarithmic number of iterations.
arXiv Detail & Related papers (2022-05-23T06:49:55Z)
Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for Non-Spherical Gaussian Mixtures [9.670578317106182]
We consider mixtures of $kgeq 2$ Gaussian components with unknown means and unknown covariance (identical for all components) that are well-separated. We show that this kind of hardness can only appear if mixing weights are allowed to be exponentially small. We develop an algorithm based on the sum-of-squares method with running time quasi-polynomial in the minimum mixing weight.
arXiv Detail & Related papers (2021-12-10T10:51:44Z)
Minimax Supervised Clustering in the Anisotropic Gaussian Mixture Model: A new take on Robust Interpolation [5.98367009147573]
We study the supervised clustering problem under the two-component anisotropic Gaussian mixture model. We show that in the high-dimensional regime, the linear discriminant analysis (LDA) classifier turns out to be sub-optimal in the minimax sense.
arXiv Detail & Related papers (2021-11-13T05:19:37Z)
Shared Independent Component Analysis for Multi-Subject Neuroimaging [107.29179765643042]
We introduce Shared Independent Component Analysis (ShICA) that models each view as a linear transform of shared independent components contaminated by additive Gaussian noise. We show that this model is identifiable if the components are either non-Gaussian or have enough diversity in noise variances. We provide empirical evidence on fMRI and MEG datasets that ShICA yields more accurate estimation of the components than alternatives.
arXiv Detail & Related papers (2021-10-26T08:54:41Z)
Go with the Flows: Mixtures of Normalizing Flows for Point Cloud Generation and Reconstruction [98.38585659305325]
normalizing flows (NFs) have demonstrated state-of-the-art performance on modeling 3D point clouds. This work enhances their representational power by applying mixtures of NFs to point clouds.
arXiv Detail & Related papers (2021-06-06T14:25:45Z)
Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting. We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$. Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
Distributed, partially collapsed MCMC for Bayesian Nonparametrics [68.5279360794418]
We exploit the fact that completely random measures, which commonly used models like the Dirichlet process and the beta-Bernoulli process can be expressed as, are decomposable into independent sub-measures. We use this decomposition to partition the latent measure into a finite measure containing only instantiated components, and an infinite measure containing all other components. The resulting hybrid algorithm can be applied to allow scalable inference without sacrificing convergence guarantees.
arXiv Detail & Related papers (2020-01-15T23:10:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.