Local Minima Structures in Gaussian Mixture Models
- URL: http://arxiv.org/abs/2009.13040v3
- Date: Sat, 9 Mar 2024 15:27:29 GMT
- Title: Local Minima Structures in Gaussian Mixture Models
- Authors: Yudong Chen, Dogyoon Song, Xumei Xi and Yuqian Zhang
- Abstract summary: We investigate the landscape of the negative log-likelihood function of Gaussian Mixture Models (GMMs) with a general number of population.
As the number of components is not globally optimal, there can be multiple local minima that are not globally well-separated.
Our results apply to where the true mixture satisfy a certain separation condition, and are valid when the number of components is valid.
- Score: 14.87510193228802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the landscape of the negative log-likelihood function of
Gaussian Mixture Models (GMMs) with a general number of components in the
population limit. As the objective function is non-convex, there can be
multiple local minima that are not globally optimal, even for well-separated
mixture models. Our study reveals that all local minima share a common
structure that partially identifies the cluster centers (i.e., means of the
Gaussian components) of the true location mixture. Specifically, each local
minimum can be represented as a non-overlapping combination of two types of
sub-configurations: fitting a single mean estimate to multiple Gaussian
components or fitting multiple estimates to a single true component. These
results apply to settings where the true mixture components satisfy a certain
separation condition, and are valid even when the number of components is over-
or under-specified. We also present a more fine-grained analysis for the
setting of one-dimensional GMMs with three components, which provide sharper
approximation error bounds with improved dependence on the separation.
Related papers
- Robust Mixture Learning when Outliers Overwhelm Small Groups [51.49063715477438]
We study the problem of estimating the means of well-separated mixtures when an adversary may add arbitrary outliers.
We propose an algorithm that obtains order-optimal error guarantees for each mixture mean with a minimal list-size overhead.
arXiv Detail & Related papers (2024-07-22T16:51:05Z) - A Fourier Approach to the Parameter Estimation Problem for One-dimensional Gaussian Mixture Models [21.436254507839738]
We propose a novel algorithm for estimating parameters in one-dimensional Gaussian mixture models.
We show that our algorithm achieves better scores in likelihood, AIC, and BIC when compared to the EM algorithm.
arXiv Detail & Related papers (2024-04-19T03:53:50Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Beyond EM Algorithm on Over-specified Two-Component Location-Scale
Gaussian Mixtures [29.26015093627193]
We develop the Exponential Location Update (ELU) algorithm to efficiently explore the curvature of the negative log-likelihood functions.
We demonstrate that the ELU algorithm converges to the final statistical radius of the models after a logarithmic number of iterations.
arXiv Detail & Related papers (2022-05-23T06:49:55Z) - Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for
Non-Spherical Gaussian Mixtures [9.670578317106182]
We consider mixtures of $kgeq 2$ Gaussian components with unknown means and unknown covariance (identical for all components) that are well-separated.
We show that this kind of hardness can only appear if mixing weights are allowed to be exponentially small.
We develop an algorithm based on the sum-of-squares method with running time quasi-polynomial in the minimum mixing weight.
arXiv Detail & Related papers (2021-12-10T10:51:44Z) - Minimax Supervised Clustering in the Anisotropic Gaussian Mixture Model:
A new take on Robust Interpolation [5.98367009147573]
We study the supervised clustering problem under the two-component anisotropic Gaussian mixture model.
We show that in the high-dimensional regime, the linear discriminant analysis (LDA) classifier turns out to be sub-optimal in the minimax sense.
arXiv Detail & Related papers (2021-11-13T05:19:37Z) - Shared Independent Component Analysis for Multi-Subject Neuroimaging [107.29179765643042]
We introduce Shared Independent Component Analysis (ShICA) that models each view as a linear transform of shared independent components contaminated by additive Gaussian noise.
We show that this model is identifiable if the components are either non-Gaussian or have enough diversity in noise variances.
We provide empirical evidence on fMRI and MEG datasets that ShICA yields more accurate estimation of the components than alternatives.
arXiv Detail & Related papers (2021-10-26T08:54:41Z) - Go with the Flows: Mixtures of Normalizing Flows for Point Cloud
Generation and Reconstruction [98.38585659305325]
normalizing flows (NFs) have demonstrated state-of-the-art performance on modeling 3D point clouds.
This work enhances their representational power by applying mixtures of NFs to point clouds.
arXiv Detail & Related papers (2021-06-06T14:25:45Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - Distributed, partially collapsed MCMC for Bayesian Nonparametrics [68.5279360794418]
We exploit the fact that completely random measures, which commonly used models like the Dirichlet process and the beta-Bernoulli process can be expressed as, are decomposable into independent sub-measures.
We use this decomposition to partition the latent measure into a finite measure containing only instantiated components, and an infinite measure containing all other components.
The resulting hybrid algorithm can be applied to allow scalable inference without sacrificing convergence guarantees.
arXiv Detail & Related papers (2020-01-15T23:10:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.