Model-based clustering of partial records
- URL: http://arxiv.org/abs/2103.16336v1
- Date: Tue, 30 Mar 2021 13:30:59 GMT
- Title: Model-based clustering of partial records
- Authors: Emily M. Goren and Ranjan Maitra
- Abstract summary: We develop clustering methodology through a model-based approach using the marginal density for the observed values.
We compare our algorithm to the corresponding full expectation-maximization (EM) approach that considers the missing values in the incomplete data set.
Simulation studies demonstrate that our approach has favorable recovery of the true cluster partition compared to case deletion and imputation.
- Score: 11.193504036335503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Partially recorded data are frequently encountered in many applications. In
practice, such datasets are usually clustered by removing incomplete cases or
features with missing values, or by imputing missing values, followed by
application of a clustering algorithm to the resulting altered data set. Here,
we develop clustering methodology through a model-based approach using the
marginal density for the observed values, using a finite mixture model of
multivariate $t$ distributions. We compare our algorithm to the corresponding
full expectation-maximization (EM) approach that considers the missing values
in the incomplete data set and makes a missing at random (MAR) assumption, as
well as case deletion and imputation. Since only the observed values are
utilized, our approach is computationally more efficient than imputation or
full EM. Simulation studies demonstrate that our approach has favorable
recovery of the true cluster partition compared to case deletion and imputation
under various missingness mechanisms, and is more robust to extreme MAR
violations than the full EM approach since it does not use the observed values
to inform those that are missing. Our methodology is demonstrated on a problem
of clustering gamma-ray bursts and is implemented in the
https://github.com/emilygoren/MixtClust R package.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.