Microbiome subcommunity learning with logistic-tree normal latent
Dirichlet allocation
- URL: http://arxiv.org/abs/2109.05386v1
- Date: Sat, 11 Sep 2021 22:52:12 GMT
- Title: Microbiome subcommunity learning with logistic-tree normal latent
Dirichlet allocation
- Authors: Patrick LeBlanc and Li Ma
- Abstract summary: Mixed-membership (MM) models have been applied to microbiome compositional data to identify latent subcommunities of microbial species.
We present a new MM model that allows variation in the composition of each subcommunity around some centroid'' composition.
- Score: 3.960875974762257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixed-membership (MM) models such as Latent Dirichlet Allocation (LDA) have
been applied to microbiome compositional data to identify latent subcommunities
of microbial species. However, microbiome compositional data, especially those
collected from the gut, typically display substantial cross-sample
heterogeneities in the subcommunity composition which current MM methods do not
account for. To address this limitation, we incorporate the logistic-tree
normal (LTN) model -- using the phylogenetic tree structure -- into the LDA
model to form a new MM model. This model allows variation in the composition of
each subcommunity around some ``centroid'' composition. Incorporation of
auxiliary P\'olya-Gamma variables enables a computationally efficient collapsed
blocked Gibbs sampler to carry out Bayesian inference under this model. We
compare the new model and LDA and show that in the presence of large
cross-sample heterogeneity, under the LDA model the resulting inference can be
extremely sensitive to the specification of the total number of subcommunities
as it does not account for cross-sample heterogeneity. As such, the popular
strategy in other applications of MM models of overspecifying the number of
subcommunities -- and hoping that some meaningful subcommunities will emerge
among artificial ones -- can lead to highly misleading conclusions in the
microbiome context. In contrast, by accounting for such heterogeneity, our MM
model restores the robustness of the inference in the specification of the
number of subcommunities and again allows meaningful subcommunities to be
identified under this strategy.
Related papers
- Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold [83.18058549195855]
We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities.
In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depends on the microenvironment of cells specific to each patient.
We propose Meta Flow Matching (MFM), a practical approach to integrating along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations.
arXiv Detail & Related papers (2024-08-26T20:05:31Z) - Deep asymmetric mixture model for unsupervised cell segmentation [4.211173851121561]
This paper presents a novel asymmetric mixture model for unsupervised cell segmentation.
It is built by aggregating certain multivariate Gaussian mixture models with log-likelihood and self-supervised-based optimization functions.
The proposed asymmetric mixture model outperforms the existing state-of-the-art unsupervised models on cell segmentation including the segment anything.
arXiv Detail & Related papers (2024-06-03T22:12:22Z) - sc-OTGM: Single-Cell Perturbation Modeling by Solving Optimal Mass Transport on the Manifold of Gaussian Mixtures [0.9674145073701153]
sc-OTGM is an unsupervised model grounded in the inductive bias that the scRNAseq data can be generated.
sc-OTGM is effective in cell state classification, aids in the analysis of differential gene expression, and ranks genes for target identification.
It also predicts the effects of single-gene perturbations on downstream gene regulation and generates synthetic scRNA-seq data conditioned on specific cell states.
arXiv Detail & Related papers (2024-05-06T06:46:11Z) - Toward the Identifiability of Comparative Deep Generative Models [7.5479347719819865]
We propose a theory of identifiability for comparative Deep Generative Models (DGMs)
We show that, while these models lack identifiability across a general class of mixing functions, they surprisingly become identifiable when the mixing function is piece-wise affine.
We also investigate the impact of model misspecification, and empirically show that previously proposed regularization techniques for fitting comparative DGMs help with identifiability when the number of latent variables is not known in advance.
arXiv Detail & Related papers (2024-01-29T06:10:54Z) - Mixed Models with Multiple Instance Learning [51.440557223100164]
We introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL)
Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets.
arXiv Detail & Related papers (2023-11-04T16:42:42Z) - A Novel Unified Conditional Score-based Generative Framework for
Multi-modal Medical Image Completion [54.512440195060584]
We propose the Unified Multi-Modal Conditional Score-based Generative Model (UMM-CSGM) to take advantage of Score-based Generative Model (SGM)
UMM-CSGM employs a novel multi-in multi-out Conditional Score Network (mm-CSN) to learn a comprehensive set of cross-modal conditional distributions.
Experiments on BraTS19 dataset show that the UMM-CSGM can more reliably synthesize the heterogeneous enhancement and irregular area in tumor-induced lesions.
arXiv Detail & Related papers (2022-07-07T16:57:21Z) - Gaussian Latent Dirichlet Allocation for Discrete Human State Discovery [1.057079240576682]
We propose and validate an unsupervised probabilistic model, Gaussian Latent Dirichlet Allocation (GLDA), for the problem of discrete state discovery.
GLDA borrows the individual-specific mixture structure from a popular topic model Latent Dirichlet Allocation (LDA) in Natural Language Processing.
We found that in both datasets the GLDA-learned class weights achieved significantly higher correlations with clinically assessed depression, anxiety, and stress scores than those produced by the baseline GMM.
arXiv Detail & Related papers (2022-06-28T18:33:46Z) - A new LDA formulation with covariates [3.1690891866882236]
The Latent Dirichlet Allocation model is a popular method for creating mixed-membership clusters.
We propose a new formulation for the LDA model which incorporates covariates.
We use slice sampling within a Gibbs sampling algorithm to estimate model parameters.
The model is illustrated using real data sets from three different areas: text-mining of Coronavirus articles, analysis of grocery shopping baskets, and ecology of tree species on Barro Colorado Island (Panama)
arXiv Detail & Related papers (2022-02-18T19:58:24Z) - Entropy Minimizing Matrix Factorization [102.26446204624885]
Nonnegative Matrix Factorization (NMF) is a widely-used data analysis technique, and has yielded impressive results in many real-world tasks.
In this study, an Entropy Minimizing Matrix Factorization framework (EMMF) is developed to tackle the above problem.
Considering that the outliers are usually much less than the normal samples, a new entropy loss function is established for matrix factorization.
arXiv Detail & Related papers (2021-03-24T21:08:43Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - A Rigorous Link Between Self-Organizing Maps and Gaussian Mixture Models [78.6363825307044]
This work presents a mathematical treatment of the relation between Self-Organizing Maps (SOMs) and Gaussian Mixture Models (GMMs)
We show that energy-based SOM models can be interpreted as performing gradient descent.
This link allows to treat SOMs as generative probabilistic models, giving a formal justification for using SOMs to detect outliers, or for sampling.
arXiv Detail & Related papers (2020-09-24T14:09:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.