Multi-modality fusion using canonical correlation analysis methods:
Application in breast cancer survival prediction from histology and genomics
- URL: http://arxiv.org/abs/2111.13987v1
- Date: Sat, 27 Nov 2021 21:18:01 GMT
- Title: Multi-modality fusion using canonical correlation analysis methods:
Application in breast cancer survival prediction from histology and genomics
- Authors: Vaishnavi Subramanian, Tanveer Syeda-Mahmood, and Minh N. Do
- Abstract summary: We study the use of canonical correlation analysis (CCA) and penalized variants of CCA for the fusion of two modalities.
We analytically show that, with known model parameters, posterior mean estimators that jointly use both modalities outperform arbitrary linear mixing of single modality posterior estimators in latent variable prediction.
- Score: 16.537929113715432
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The availability of multi-modality datasets provides a unique opportunity to
characterize the same object of interest using multiple viewpoints more
comprehensively. In this work, we investigate the use of canonical correlation
analysis (CCA) and penalized variants of CCA (pCCA) for the fusion of two
modalities. We study a simple graphical model for the generation of
two-modality data. We analytically show that, with known model parameters,
posterior mean estimators that jointly use both modalities outperform arbitrary
linear mixing of single modality posterior estimators in latent variable
prediction. Penalized extensions of CCA (pCCA) that incorporate domain
knowledge can discover correlations with high-dimensional, low-sample data,
whereas traditional CCA is inapplicable. To facilitate the generation of
multi-dimensional embeddings with pCCA, we propose two matrix deflation schemes
that enforce desirable properties exhibited by CCA. We propose a two-stage
prediction pipeline using pCCA embeddings generated with deflation for latent
variable prediction by combining all the above. On simulated data, our proposed
model drastically reduces the mean-squared error in latent variable prediction.
When applied to publicly available histopathology data and RNA-sequencing data
from The Cancer Genome Atlas (TCGA) breast cancer patients, our model can
outperform principal components analysis (PCA) embeddings of the same dimension
in survival prediction.
Related papers
- Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - A Bayesian Methodology for Estimation for Sparse Canonical Correlation [0.0]
Canonical Correlation Analysis (CCA) is a statistical procedure for identifying relationships between data sets.
ScSCCA is a rapidly emerging methodological area that aims for robust modeling of the interrelations between the different data modalities.
We propose a novel ScSCCA approach where we employ a Bayesian infinite factor model and aim to achieve robust estimation.
arXiv Detail & Related papers (2023-10-30T15:14:25Z) - Gaussian Latent Dirichlet Allocation for Discrete Human State Discovery [1.057079240576682]
We propose and validate an unsupervised probabilistic model, Gaussian Latent Dirichlet Allocation (GLDA), for the problem of discrete state discovery.
GLDA borrows the individual-specific mixture structure from a popular topic model Latent Dirichlet Allocation (LDA) in Natural Language Processing.
We found that in both datasets the GLDA-learned class weights achieved significantly higher correlations with clinically assessed depression, anxiety, and stress scores than those produced by the baseline GMM.
arXiv Detail & Related papers (2022-06-28T18:33:46Z) - Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them.
We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z) - Optimal regularizations for data generation with probabilistic graphical
models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models.
We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z) - Lung Cancer Risk Estimation with Incomplete Data: A Joint Missing
Imputation Perspective [5.64530854079352]
We address imputation of missing data by modeling the joint distribution of multi-modal data.
Motivated by partial bidirectional generative adversarial net (PBiGAN), we propose a new Conditional PBiGAN (C-PBiGAN) method.
C-PBiGAN achieves significant improvements in lung cancer risk estimation compared with representative imputation methods.
arXiv Detail & Related papers (2021-07-25T20:15:16Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Probabilistic Canonical Correlation Analysis for Sparse Count Data [3.1753001245931323]
Canonical correlation analysis is an important technique for exploring the relationship between two sets of continuous variables.
We propose a model-based probabilistic approach for correlation and canonical correlation estimation for two sparse count data sets.
arXiv Detail & Related papers (2020-05-11T02:19:57Z) - Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA.
The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering.
In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.