A Robust and Flexible EM Algorithm for Mixtures of Elliptical
Distributions with Missing Data
- URL: http://arxiv.org/abs/2201.12020v4
- Date: Mon, 22 May 2023 10:36:23 GMT
- Title: A Robust and Flexible EM Algorithm for Mixtures of Elliptical
Distributions with Missing Data
- Authors: Florian Mouret, Alexandre Hippert-Ferrer, Fr\'ed\'eric Pascal,
Jean-Yves Tourneret
- Abstract summary: This paper tackles the problem of missing data imputation for noisy and non-Gaussian data.
A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data.
Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
- Score: 71.9573352891936
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper tackles the problem of missing data imputation for noisy and
non-Gaussian data. A classical imputation method, the Expectation Maximization
(EM) algorithm for Gaussian mixture models, has shown interesting properties
when compared to other popular approaches such as those based on k-nearest
neighbors or on multiple imputations by chained equations. However, Gaussian
mixture models are known to be non-robust to heterogeneous data, which can lead
to poor estimation performance when the data is contaminated by outliers or
follows non-Gaussian distributions. To overcome this issue, a new EM algorithm
is investigated for mixtures of elliptical distributions with the property of
handling potential missing data. This paper shows that this problem reduces to
the estimation of a mixture of Angular Gaussian distributions under generic
assumptions (i.e., each sample is drawn from a mixture of elliptical
distributions, which is possibly different for one sample to another). In that
case, the complete-data likelihood associated with mixtures of elliptical
distributions is well adapted to the EM framework with missing data thanks to
its conditional distribution, which is shown to be a multivariate
$t$-distribution. Experimental results on synthetic data demonstrate that the
proposed algorithm is robust to outliers and can be used with non-Gaussian
data. Furthermore, experiments conducted on real-world datasets show that this
algorithm is very competitive when compared to other classical imputation
methods.
Related papers
- The Breakdown of Gaussian Universality in Classification of High-dimensional Mixtures [6.863637695977277]
We provide a high-dimensional characterization of empirical risk minimization for classification under a general mixture data setting.
We specify conditions for Gaussian universality and discuss their implications for the choice of loss function.
arXiv Detail & Related papers (2024-10-08T01:45:37Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing.
We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z) - Gaussian Mixture Solvers for Diffusion Models [84.83349474361204]
We introduce a novel class of SDE-based solvers called GMS for diffusion models.
Our solver outperforms numerous SDE-based solvers in terms of sample quality in image generation and stroke-based synthesis.
arXiv Detail & Related papers (2023-11-02T02:05:38Z) - Mixture of von Mises-Fisher distribution with sparse prototypes [0.0]
Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere.
We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood.
arXiv Detail & Related papers (2022-12-30T08:00:38Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Joint Probability Estimation Using Tensor Decomposition and Dictionaries [3.4720326275851994]
We study non-parametric estimation of joint probabilities of a given set of discrete and continuous random variables from their (empirically estimated) 2D marginals.
We create a dictionary of various families of distributions by inspecting the data, and use it to approximate each decomposed factor of the product in the mixture.
arXiv Detail & Related papers (2022-03-03T11:55:51Z) - Clustering a Mixture of Gaussians with Unknown Covariance [4.821312633849745]
We derive a Max-Cut integer program based on maximum likelihood estimation.
We develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size.
We generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights.
arXiv Detail & Related papers (2021-10-04T17:59:20Z) - A similarity-based Bayesian mixture-of-experts model [0.5156484100374058]
We present a new non-parametric mixture-of-experts model for multivariate regression problems.
Using a conditionally specified model, predictions for out-of-sample inputs are based on similarities to each observed data point.
Posterior inference is performed on the parameters of the mixture as well as the distance metric.
arXiv Detail & Related papers (2020-12-03T18:08:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.