Hierarchical Sparse Bayesian Multitask Model with Scalable Inference for Microbiome Analysis
- URL: http://arxiv.org/abs/2502.02552v1
- Date: Tue, 04 Feb 2025 18:23:22 GMT
- Title: Hierarchical Sparse Bayesian Multitask Model with Scalable Inference for Microbiome Analysis
- Authors: Haonan Zhu, Andre R. Goncalves, Camilo Valdes, Hiranmayi Ranganathan, Boya Zhang, Jose Manuel MartÃ, Car Reen Kok, Monica K. Borucki, Nisha J. Mulakken, James B. Thissen, Crystal Jaing, Alfred Hero, Nicholas A. Be,
- Abstract summary: This paper proposes a hierarchical Bayesian multitask learning model that is applicable to the general multi-task binary classification learning problem.
We derive a computationally efficient inference algorithm based on variational inference to approximate the posterior distribution.
We demonstrate the potential of the new approach on various synthetic datasets and for predicting human health status based on microbiome profile.
- Score: 1.361248247831476
- License:
- Abstract: This paper proposes a hierarchical Bayesian multitask learning model that is applicable to the general multi-task binary classification learning problem where the model assumes a shared sparsity structure across different tasks. We derive a computationally efficient inference algorithm based on variational inference to approximate the posterior distribution. We demonstrate the potential of the new approach on various synthetic datasets and for predicting human health status based on microbiome profile. Our analysis incorporates data pooled from multiple microbiome studies, along with a comprehensive comparison with other benchmark methods. Results in synthetic datasets show that the proposed approach has superior support recovery property when the underlying regression coefficients share a common sparsity structure across different tasks. Our experiments on microbiome classification demonstrate the utility of the method in extracting informative taxa while providing well-calibrated predictions with uncertainty quantification and achieving competitive performance in terms of prediction metrics. Notably, despite the heterogeneity of the pooled datasets (e.g., different experimental objectives, laboratory setups, sequencing equipment, patient demographics), our method delivers robust results.
Related papers
- Generative modeling of density regression through tree flows [3.0262553206264893]
We propose a flow-based generative model tailored for the density regression task on tabular data.
We introduce a training algorithm for fitting the tree-based transforms using a divide-and-conquer strategy.
Our method consistently achieves comparable or superior performance at a fraction of the training and sampling budget.
arXiv Detail & Related papers (2024-06-07T21:07:35Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Counterfactual Data Augmentation with Contrastive Learning [27.28511396131235]
We introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals.
We use contrastive learning to learn a representation space and a similarity measure such that in the learned representation space close individuals identified by the learned similarity measure have similar potential outcomes.
This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group.
arXiv Detail & Related papers (2023-11-07T00:36:51Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Optimal Ensemble Construction for Multi-Study Prediction with
Applications to COVID-19 Excess Mortality Estimation [7.02598981483736]
Multi-study ensembling uses a two-stage strategy which fits study-specific models and estimates ensemble weights separately.
This approach ignores the ensemble properties at the model-fitting stage, potentially resulting in a loss of efficiency.
We show that when little data is available for a country before the onset of the pandemic, leveraging data from other countries can substantially improve prediction accuracy.
arXiv Detail & Related papers (2021-09-19T16:52:41Z) - Latent Network Estimation and Variable Selection for Compositional Data
via Variational EM [0.0]
We develop a novel method to simultaneously estimate network interactions and associations.
We show the practical utility of our model via an application to microbiome data.
arXiv Detail & Related papers (2020-10-25T21:52:39Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural
Summarization Systems [121.78477833009671]
We investigate the performance of different summarization models under a cross-dataset setting.
A comprehensive study of 11 representative summarization systems on 5 datasets from different domains reveals the effect of model architectures and generation ways.
arXiv Detail & Related papers (2020-10-11T02:19:15Z) - DCMD: Distance-based Classification Using Mixture Distributions on
Microbiome Data [10.171660468645603]
We present an innovative approach for distance-based classification using mixture distributions (DCMD)
This approach models the inherent uncertainty in sparse counts by estimating a mixture distribution for the sample data.
Results are compared against a number of existing machine learning and distance-based approaches.
arXiv Detail & Related papers (2020-03-29T23:30:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.