Finite Mixtures of Multivariate Poisson-Log Normal Factor Analyzers for
Clustering Count Data
- URL: http://arxiv.org/abs/2311.07762v1
- Date: Mon, 13 Nov 2023 21:23:15 GMT
- Title: Finite Mixtures of Multivariate Poisson-Log Normal Factor Analyzers for
Clustering Count Data
- Authors: Andrea Payne, Anjali Silva, Steven J. Rothstein, Paul D. McNicholas,
Sanjeena Subedi
- Abstract summary: A class of eight parsimonious mixture models based on the mixtures of factor analyzers model are introduced.
The proposed models are explored in the context of clustering discrete data arising from RNA sequencing studies.
- Score: 0.8499685241219366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A mixture of multivariate Poisson-log normal factor analyzers is introduced
by imposing constraints on the covariance matrix, which resulted in flexible
models for clustering purposes. In particular, a class of eight parsimonious
mixture models based on the mixtures of factor analyzers model are introduced.
Variational Gaussian approximation is used for parameter estimation, and
information criteria are used for model selection. The proposed models are
explored in the context of clustering discrete data arising from RNA sequencing
studies. Using real and simulated data, the models are shown to give favourable
clustering performance. The GitHub R package for this work is available at
https://github.com/anjalisilva/mixMPLNFA and is released under the open-source
MIT license.
Related papers
- Adaptive Transfer Clustering: A Unified Framework [2.3144964550307496]
We propose an adaptive transfer clustering (ATC) algorithm that automatically leverages the commonality in the presence of unknown discrepancy.
It applies to a broad class of statistical models including Gaussian mixture models, block models, and latent class models.
arXiv Detail & Related papers (2024-10-28T17:57:06Z) - Fusion of Gaussian Processes Predictions with Monte Carlo Sampling [61.31380086717422]
In science and engineering, we often work with models designed for accurate prediction of variables of interest.
Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes.
arXiv Detail & Related papers (2024-03-03T04:21:21Z) - A Bayesian Framework on Asymmetric Mixture of Factor Analyser [0.0]
This paper introduces an MFA model with a rich and flexible class of skew normal (unrestricted) generalized hyperbolic (called SUNGH) distributions.
The SUNGH family provides considerable flexibility to model skewness in different directions as well as allowing for heavy tailed data.
Considering factor analysis models, the SUNGH family also allows for skewness and heavy tails for both the error component and factor scores.
arXiv Detail & Related papers (2022-11-01T20:19:52Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - A new LDA formulation with covariates [3.1690891866882236]
The Latent Dirichlet Allocation model is a popular method for creating mixed-membership clusters.
We propose a new formulation for the LDA model which incorporates covariates.
We use slice sampling within a Gibbs sampling algorithm to estimate model parameters.
The model is illustrated using real data sets from three different areas: text-mining of Coronavirus articles, analysis of grocery shopping baskets, and ecology of tree species on Barro Colorado Island (Panama)
arXiv Detail & Related papers (2022-02-18T19:58:24Z) - Personalized Federated Learning via Convex Clustering [72.15857783681658]
We propose a family of algorithms for personalized federated learning with locally convex user costs.
The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized.
arXiv Detail & Related papers (2022-02-01T19:25:31Z) - Vine copula mixture models and clustering for non-Gaussian data [0.0]
We propose a novel vine copula mixture model for continuous data.
We show that the model-based clustering algorithm with vine copula mixture models outperforms the other model-based clustering techniques.
arXiv Detail & Related papers (2021-02-05T16:04:26Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z) - Robust M-Estimation Based Bayesian Cluster Enumeration for Real
Elliptically Symmetric Distributions [5.137336092866906]
Robustly determining optimal number of clusters in a data set is an essential factor in a wide range of applications.
This article generalizes so that it can be used with any arbitrary Really Symmetric (RES) distributed mixture model.
We derive a robust criterion for data sets with finite sample size, and also provide an approximation to reduce the computational cost at large sample sizes.
arXiv Detail & Related papers (2020-05-04T11:44:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.