Latent Network Estimation and Variable Selection for Compositional Data
via Variational EM
- URL: http://arxiv.org/abs/2010.13229v2
- Date: Fri, 9 Apr 2021 02:44:02 GMT
- Title: Latent Network Estimation and Variable Selection for Compositional Data
via Variational EM
- Authors: Nathan Osborne, Christine B. Peterson, and Marina Vannucci
- Abstract summary: We develop a novel method to simultaneously estimate network interactions and associations.
We show the practical utility of our model via an application to microbiome data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Network estimation and variable selection have been extensively studied in
the statistical literature, but only recently have those two challenges been
addressed simultaneously. In this paper, we seek to develop a novel method to
simultaneously estimate network interactions and associations to relevant
covariates for count data, and specifically for compositional data, which have
a fixed sum constraint. We use a hierarchical Bayesian model with latent layers
and employ spike-and-slab priors for both edge and covariate selection. For
posterior inference, we develop a novel variational inference scheme with an
expectation maximization step, to enable efficient estimation. Through
simulation studies, we demonstrate that the proposed model outperforms existing
methods in its accuracy of network recovery. We show the practical utility of
our model via an application to microbiome data. The human microbiome has been
shown to contribute to many of the functions of the human body, and also to be
linked with a number of diseases. In our application, we seek to better
understand the interaction between microbes and relevant covariates, as well as
the interaction of microbes with each other. We provide a Python implementation
of our algorithm, called SINC (Simultaneous Inference for Networks and
Covariates), available online.
Related papers
- Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Cross-Validation for Training and Testing Co-occurrence Network
Inference Algorithms [1.8638865257327277]
Co-occurrence network inference algorithms help us understand the complex associations of micro-organisms, especially bacteria.
Previous methods for evaluating the quality of the inferred network include using external data, and network consistency across sub-samples.
We propose a novel cross-validation method to evaluate co-occurrence network inference algorithms, and new methods for applying existing algorithms to predict on test data.
arXiv Detail & Related papers (2023-09-26T19:43:15Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Bayesian community detection for networks with covariates [16.230648949593153]
"Community detection" has arguably received the most attention in the scientific community.
We propose a block model with a co-dependent random partition prior.
Our model has the ability to learn the number of the communities via posterior inference without having to assume it to be known.
arXiv Detail & Related papers (2022-03-04T01:58:35Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Two-step penalised logistic regression for multi-omic data with an
application to cardiometabolic syndrome [62.997667081978825]
We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately.
Our approach should be preferred if the goal is to select as many relevant predictors as possible.
Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level.
arXiv Detail & Related papers (2020-08-01T10:36:27Z) - DeepCOVIDNet: An Interpretable Deep Learning Model for Predictive
Surveillance of COVID-19 Using Heterogeneous Features and their Interactions [2.30238915794052]
We propose a deep learning model to forecast the range of increase in COVID-19 infected cases in future days.
Using data collected from various sources, we estimate the range of increase in infected cases seven days into the future for all U.S. counties.
arXiv Detail & Related papers (2020-07-31T23:37:38Z) - A Functional Model for Structure Learning and Parameter Estimation in
Continuous Time Bayesian Network: An Application in Identifying Patterns of
Multiple Chronic Conditions [2.440763941001707]
We propose a continuous time Bayesian network with conditional dependencies, represented as Poisson regression.
We use a dataset of patients with multiple chronic conditions extracted from electronic health records of the Department of Veterans Affairs.
The proposed approach provides a sparse intuitive representation of the complex functional relationships between multiple chronic conditions.
arXiv Detail & Related papers (2020-07-31T05:02:34Z) - DCMD: Distance-based Classification Using Mixture Distributions on
Microbiome Data [10.171660468645603]
We present an innovative approach for distance-based classification using mixture distributions (DCMD)
This approach models the inherent uncertainty in sparse counts by estimating a mixture distribution for the sample data.
Results are compared against a number of existing machine learning and distance-based approaches.
arXiv Detail & Related papers (2020-03-29T23:30:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.