DCMD: Distance-based Classification Using Mixture Distributions on
Microbiome Data
- URL: http://arxiv.org/abs/2003.13161v1
- Date: Sun, 29 Mar 2020 23:30:20 GMT
- Title: DCMD: Distance-based Classification Using Mixture Distributions on
Microbiome Data
- Authors: Konstantin Shestopaloff, Mei Dong, Fan Gao, Wei Xu
- Abstract summary: We present an innovative approach for distance-based classification using mixture distributions (DCMD)
This approach models the inherent uncertainty in sparse counts by estimating a mixture distribution for the sample data.
Results are compared against a number of existing machine learning and distance-based approaches.
- Score: 10.171660468645603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current advances in next generation sequencing techniques have allowed
researchers to conduct comprehensive research on microbiome and human diseases,
with recent studies identifying associations between human microbiome and
health outcomes for a number of chronic conditions. However, microbiome data
structure, characterized by sparsity and skewness, presents challenges to
building effective classifiers. To address this, we present an innovative
approach for distance-based classification using mixture distributions (DCMD).
The method aims to improve classification performance when using microbiome
community data, where the predictors are composed of sparse and heterogeneous
count data. This approach models the inherent uncertainty in sparse counts by
estimating a mixture distribution for the sample data, and representing each
observation as a distribution, conditional on observed counts and the estimated
mixture, which are then used as inputs for distance-based classification. The
method is implemented into a k-means and k-nearest neighbours framework and we
identify two distance metrics that produce optimal results. The performance of
the model is assessed using simulations and applied to a human microbiome
study, with results compared against a number of existing machine learning and
distance-based approaches. The proposed method is competitive when compared to
the machine learning approaches and showed a clear improvement over commonly
used distance-based classifiers. The range of applicability and robustness make
the proposed method a viable alternative for classification using sparse
microbiome count data.
Related papers
- Pretrained-Guided Conditional Diffusion Models for Microbiome Data Analysis [1.433758865948252]
We introduce mbVDiT, a novel pre-trained conditional diffusion model for microbiome data imputation and denoising.
It uses the unmasked data and patient metadata as conditional guidance for imputating missing values.
It is also uses VAE to integrate the the other public microbiome datasets to enhance model performance.
arXiv Detail & Related papers (2024-08-10T01:54:06Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Cross-Validation for Training and Testing Co-occurrence Network
Inference Algorithms [1.8638865257327277]
Co-occurrence network inference algorithms help us understand the complex associations of micro-organisms, especially bacteria.
Previous methods for evaluating the quality of the inferred network include using external data, and network consistency across sub-samples.
We propose a novel cross-validation method to evaluate co-occurrence network inference algorithms, and new methods for applying existing algorithms to predict on test data.
arXiv Detail & Related papers (2023-09-26T19:43:15Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Scalable Intervention Target Estimation in Linear Models [52.60799340056917]
Current approaches to causal structure learning either work with known intervention targets or use hypothesis testing to discover the unknown intervention targets.
This paper proposes a scalable and efficient algorithm that consistently identifies all intervention targets.
The proposed algorithm can be used to also update a given observational Markov equivalence class into the interventional Markov equivalence class.
arXiv Detail & Related papers (2021-11-15T03:16:56Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Latent Network Estimation and Variable Selection for Compositional Data
via Variational EM [0.0]
We develop a novel method to simultaneously estimate network interactions and associations.
We show the practical utility of our model via an application to microbiome data.
arXiv Detail & Related papers (2020-10-25T21:52:39Z) - Mycorrhiza: Genotype Assignment usingPhylogenetic Networks [2.286041284499166]
We introduce Mycorrhiza, a machine learning approach for the genotype assignment problem.
Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples.
Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium.
arXiv Detail & Related papers (2020-10-14T02:36:27Z) - Neural Estimators for Conditional Mutual Information Using Nearest
Neighbors Sampling [36.35382677479192]
estimation of mutual information (MI) or conditional mutual information (CMI) from a set of samples is a long-standing problem.
Recent work has leveraged the approximation power of artificial neural networks and has shown improvements over conventional methods.
We introduce a new technique, based on k nearest neighbors (k-NN), to perform the resampling and derive high-confidence concentration bounds for the sample average.
arXiv Detail & Related papers (2020-06-12T14:30:45Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.