Bayesian predictive modeling of multi-source multi-way data
- URL: http://arxiv.org/abs/2208.03396v1
- Date: Fri, 5 Aug 2022 21:58:23 GMT
- Title: Bayesian predictive modeling of multi-source multi-way data
- Authors: Jonathan Kim, Brian J. Sandri, Raghavendra B. Rao, Eric F. Lock
- Abstract summary: We consider molecular data from multiple 'omics sources as predictors of early-life iron deficiency (ID) in a rhesus monkey model.
We use a linear model with a low-rank structure on the coefficients to capture multi-way dependence.
We show that our model performs as expected in terms of misclassification rates and correlation of estimated coefficients with true coefficients.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We develop a Bayesian approach to predict a continuous or binary outcome from
data that are collected from multiple sources with a multi-way (i.e..
multidimensional tensor) structure. As a motivating example we consider
molecular data from multiple 'omics sources, each measured over multiple
developmental time points, as predictors of early-life iron deficiency (ID) in
a rhesus monkey model. We use a linear model with a low-rank structure on the
coefficients to capture multi-way dependence and model the variance of the
coefficients separately across each source to infer their relative
contributions. Conjugate priors facilitate an efficient Gibbs sampling
algorithm for posterior inference, assuming a continuous outcome with normal
errors or a binary outcome with a probit link. Simulations demonstrate that our
model performs as expected in terms of misclassification rates and correlation
of estimated coefficients with true coefficients, with large gains in
performance by incorporating multi-way structure and modest gains when
accounting for differing signal sizes across the different sources. Moreover,
it provides robust classification of ID monkeys for our motivating application.
Software in the form of R code is available at
https://github.com/BiostatsKim/BayesMSMW .
Related papers
- Bayesian Joint Additive Factor Models for Multiview Learning [7.254731344123118]
A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes.
We propose a joint additive factor regression model (JAFAR) with a structured additive design, accounting for shared and view-specific components.
Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors.
arXiv Detail & Related papers (2024-06-02T15:35:45Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Bayesian Additive Main Effects and Multiplicative Interaction Models
using Tensor Regression for Multi-environmental Trials [0.0]
We propose a Bayesian tensor regression model to accommodate the effect of multiple factors on phenotype prediction.
We adopt a set of prior distributions that resolve identifiability issues that may arise between the parameters in the model.
We explore the applicability of our model by analysing real-world data related to wheat production across Ireland from 2010 to 2019.
arXiv Detail & Related papers (2023-01-09T19:54:50Z) - Learning Multivariate CDFs and Copulas using Tensor Factorization [39.24470798045442]
Learning the multivariate distribution of data is a core challenge in statistics and machine learning.
In this work, we aim to learn multivariate cumulative distribution functions (CDFs), as they can handle mixed random variables.
We show that any grid sampled version of a joint CDF of mixed random variables admits a universal representation as a naive Bayes model.
We demonstrate the superior performance of the proposed model in several synthetic and real datasets and applications including regression, sampling and data imputation.
arXiv Detail & Related papers (2022-10-13T16:18:46Z) - A Graphical Model for Fusing Diverse Microbiome Data [2.385985842958366]
We introduce a flexible multinomial-Gaussian generative model for jointly modeling such count data.
We present a computationally scalable variational Expectation-Maximization (EM) algorithm for inferring the latent variables and the parameters of the model.
arXiv Detail & Related papers (2022-08-21T17:54:39Z) - Optimal regularizations for data generation with probabilistic graphical
models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models.
We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z) - Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions.
We investigate methods for aggregating any number of conditional quantile models.
All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Two-step penalised logistic regression for multi-omic data with an
application to cardiometabolic syndrome [62.997667081978825]
We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately.
Our approach should be preferred if the goal is to select as many relevant predictors as possible.
Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level.
arXiv Detail & Related papers (2020-08-01T10:36:27Z) - On the Discrepancy between Density Estimation and Sequence Generation [92.70116082182076]
log-likelihood is highly correlated with BLEU when we consider models within the same family.
We observe no correlation between rankings of models across different families.
arXiv Detail & Related papers (2020-02-17T20:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.