Probabilistic Canonical Correlation Analysis for Sparse Count Data
- URL: http://arxiv.org/abs/2005.04837v1
- Date: Mon, 11 May 2020 02:19:57 GMT
- Title: Probabilistic Canonical Correlation Analysis for Sparse Count Data
- Authors: Lin Qiu and Vernon M. Chinchilli
- Abstract summary: Canonical correlation analysis is an important technique for exploring the relationship between two sets of continuous variables.
We propose a model-based probabilistic approach for correlation and canonical correlation estimation for two sparse count data sets.
- Score: 3.1753001245931323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Canonical correlation analysis (CCA) is a classical and important
multivariate technique for exploring the relationship between two sets of
continuous variables. CCA has applications in many fields, such as genomics and
neuroimaging. It can extract meaningful features as well as use these features
for subsequent analysis. Although some sparse CCA methods have been developed
to deal with high-dimensional problems, they are designed specifically for
continuous data and do not consider the integer-valued data from
next-generation sequencing platforms that exhibit very low counts for some
important features. We propose a model-based probabilistic approach for
correlation and canonical correlation estimation for two sparse count data sets
(PSCCA). PSCCA demonstrates that correlations and canonical correlations
estimated at the natural parameter level are more appropriate than traditional
estimation methods applied to the raw data. We demonstrate through simulation
studies that PSCCA outperforms other standard correlation approaches and sparse
CCA approaches in estimating the true correlations and canonical correlations
at the natural parameter level. We further apply the PSCCA method to study the
association of miRNA and mRNA expression data sets from a squamous cell lung
cancer study, finding that PSCCA can uncover a large number of strongly
correlated pairs than standard correlation and other sparse CCA approaches.
Related papers
- Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - A Bayesian Methodology for Estimation for Sparse Canonical Correlation [0.0]
Canonical Correlation Analysis (CCA) is a statistical procedure for identifying relationships between data sets.
ScSCCA is a rapidly emerging methodological area that aims for robust modeling of the interrelations between the different data modalities.
We propose a novel ScSCCA approach where we employ a Bayesian infinite factor model and aim to achieve robust estimation.
arXiv Detail & Related papers (2023-10-30T15:14:25Z) - Tensor Generalized Canonical Correlation Analysis [0.0]
Generalized Generalized Canonical Correlation Analysis (RGCCA) is a general statistical framework for multi-block data analysis.
This paper presents TGCCA, a new method for analyzing higher-order tensors with admitting an canonical rank-R decomposition.
The efficiency and usefulness of TGCCA are evaluated on simulated and real data and compared favorably to state-of-the-art approaches.
arXiv Detail & Related papers (2023-02-10T14:41:12Z) - Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them.
We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z) - Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm.
Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z) - Multi-modality fusion using canonical correlation analysis methods:
Application in breast cancer survival prediction from histology and genomics [16.537929113715432]
We study the use of canonical correlation analysis (CCA) and penalized variants of CCA for the fusion of two modalities.
We analytically show that, with known model parameters, posterior mean estimators that jointly use both modalities outperform arbitrary linear mixing of single modality posterior estimators in latent variable prediction.
arXiv Detail & Related papers (2021-11-27T21:18:01Z) - Conditional canonical correlation estimation based on covariates with
random forests [0.0]
We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables.
The proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error.
arXiv Detail & Related papers (2020-11-23T17:09:46Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Minimax Quasi-Bayesian estimation in sparse canonical correlation
analysis via a Rayleigh quotient function [1.0878040851638]
Existing rate-optimal estimators for sparse canonical vectors have high computational cost.
We propose a quasi-Bayesian estimation procedure that achieves the minimax estimation rate.
We use the proposed methodology to maximally correlate clinical variables and proteomic data for better understanding the Covid-19 disease.
arXiv Detail & Related papers (2020-10-16T21:00:57Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.