Dendrogram of mixing measures: Hierarchical clustering and model
selection for finite mixture models
- URL: http://arxiv.org/abs/2403.01684v2
- Date: Fri, 8 Mar 2024 16:39:32 GMT
- Title: Dendrogram of mixing measures: Hierarchical clustering and model
selection for finite mixture models
- Authors: Dat Do, Linh Do, Scott A. McKinley, Jonathan Terhorst, XuanLong Nguyen
- Abstract summary: We present a new way to summarize and select mixture models via the hierarchical clustering tree (dendrogram) constructed from an overfitted latent mixing measure.
Our proposed method bridges agglomerative hierarchical clustering and mixture modeling.
- Score: 5.044813181406083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new way to summarize and select mixture models via the
hierarchical clustering tree (dendrogram) constructed from an overfitted latent
mixing measure. Our proposed method bridges agglomerative hierarchical
clustering and mixture modeling. The dendrogram's construction is derived from
the theory of convergence of the mixing measures, and as a result, we can both
consistently select the true number of mixing components and obtain the
pointwise optimal convergence rate for parameter estimation from the tree, even
when the model parameters are only weakly identifiable. In theory, it
explicates the choice of the optimal number of clusters in hierarchical
clustering. In practice, the dendrogram reveals more information on the
hierarchy of subpopulations compared to traditional ways of summarizing mixture
models. Several simulation studies are carried out to support our theory. We
also illustrate the methodology with an application to single-cell RNA sequence
analysis.
Related papers
- Hierarchical Matrix Completion for the Prediction of Properties of Binary Mixtures [3.0478550046333965]
We introduce a novel generic approach for improving data-driven models.
We lump components that behave similarly into chemical classes and model them jointly.
Using clustering leads to significantly improved predictions compared to an MCM without clustering.
arXiv Detail & Related papers (2024-10-08T14:04:30Z) - Adaptive Fuzzy C-Means with Graph Embedding [84.47075244116782]
Fuzzy clustering algorithms can be roughly categorized into two main groups: Fuzzy C-Means (FCM) based methods and mixture model based methods.
We propose a novel FCM based clustering model that is capable of automatically learning an appropriate membership degree hyper- parameter value.
arXiv Detail & Related papers (2024-05-22T08:15:50Z) - Mixture of multilayer stochastic block models for multiview clustering [0.0]
We propose an original method for aggregating multiple clustering coming from different sources of information.
The identifiability of the model parameters is established and a variational Bayesian EM algorithm is proposed for the estimation of these parameters.
The method is utilized to analyze global food trading networks, leading to structures of interest.
arXiv Detail & Related papers (2024-01-09T17:15:47Z) - Time Series Clustering with an EM algorithm for Mixtures of Linear
Gaussian State Space Models [0.0]
We propose a novel model-based time series clustering method with mixtures of linear Gaussian state space models.
The proposed method uses a new expectation-maximization algorithm for the mixture model to estimate the model parameters.
Experiments on a simulated dataset demonstrate the effectiveness of the method in clustering, parameter estimation, and model selection.
arXiv Detail & Related papers (2022-08-25T07:41:23Z) - Fitting large mixture models using stochastic component selection [0.0]
We propose a combination of the expectation of the computational and the Metropolis-Hastings algorithm to evaluate only a small number of components.
The Markov chain of component assignments is sequentially generated across the algorithm's iterations.
We put emphasis on generality of our method, equipping it with the ability to train both shallow and deep mixture models.
arXiv Detail & Related papers (2021-10-10T12:39:53Z) - Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals.
We propose a general graph estimator based on a novel structured fusion regularization.
We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z) - Vine copula mixture models and clustering for non-Gaussian data [0.0]
We propose a novel vine copula mixture model for continuous data.
We show that the model-based clustering algorithm with vine copula mixture models outperforms the other model-based clustering techniques.
arXiv Detail & Related papers (2021-02-05T16:04:26Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.