D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis
for Multi-view High-dimensional Data
- URL: http://arxiv.org/abs/2001.02856v3
- Date: Fri, 16 Sep 2022 14:43:33 GMT
- Title: D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis
for Multi-view High-dimensional Data
- Authors: Hai Shu, Zhe Qu, Hongtu Zhu
- Abstract summary: A popular model in high-dimensional multi-view data analysis decomposes each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views.
We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA)
Our D-GCCA takes one step further than generalized canonical correlation analysis by separating common and distinctive components among canonical variables.
- Score: 11.184915338554422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern biomedical studies often collect multi-view data, that is, multiple
types of data measured on the same set of objects. A popular model in
high-dimensional multi-view data analysis is to decompose each view's data
matrix into a low-rank common-source matrix generated by latent factors common
across all data views, a low-rank distinctive-source matrix corresponding to
each view, and an additive noise matrix. We propose a novel decomposition
method for this model, called decomposition-based generalized canonical
correlation analysis (D-GCCA). The D-GCCA rigorously defines the decomposition
on the L2 space of random variables in contrast to the Euclidean dot product
space used by most existing methods, thereby being able to provide the
estimation consistency for the low-rank matrix recovery. Moreover, to well
calibrate common latent factors, we impose a desirable orthogonality constraint
on distinctive latent factors. Existing methods, however, inadequately consider
such orthogonality and may thus suffer from substantial loss of undetected
common-source variation. Our D-GCCA takes one step further than generalized
canonical correlation analysis by separating common and distinctive components
among canonical variables, while enjoying an appealing interpretation from the
perspective of principal component analysis. Furthermore, we propose to use the
variable-level proportion of signal variance explained by common or distinctive
latent factors for selecting the variables most influenced. Consistent
estimators of our D-GCCA method are established with good finite-sample
numerical performance, and have closed-form expressions leading to efficient
computation especially for large-scale data. The superiority of D-GCCA over
state-of-the-art methods is also corroborated in simulations and real-world
data examples.
Related papers
- Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - D-CDLF: Decomposition of Common and Distinctive Latent Factors for Multi-view High-dimensional Data [2.2481284426718533]
A typical approach to the joint analysis of multiple high-dimensional data views is to decompose each view's data matrix into three parts.
We propose a novel decomposition method, called Decomposition of Common and Distinctive Latent Factors (D-CDLF), to effectively achieve both types of uncorrelatedness for two-view data.
arXiv Detail & Related papers (2024-06-30T15:38:38Z) - Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model.
Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z) - Simultaneous Dimensionality Reduction: A Data Efficient Approach for Multimodal Representations Learning [0.0]
We explore two primary classes of approaches to dimensionality reduction (DR): Independent Dimensionality Reduction (IDR) and Simultaneous Dimensionality Reduction (SDR)
In IDR, each modality is compressed independently, striving to retain as much variation within each modality as possible.
In SDR, one simultaneously compresses the modalities to maximize the covariation between the reduced descriptions while paying less attention to how much individual variation is preserved.
arXiv Detail & Related papers (2023-10-05T04:26:24Z) - Multiple Augmented Reduced Rank Regression for Pan-Cancer Analysis [0.0]
We propose multiple augmented reduced rank regression (maRRR), a flexible matrix regression and factorization method.
We consider a structured nuclear norm objective that is motivated by random matrix theory.
We apply maRRR to gene expression data from multiple cancer types (i.e., pan-cancer) from TCGA.
arXiv Detail & Related papers (2023-08-30T21:40:58Z) - Multi-modality fusion using canonical correlation analysis methods:
Application in breast cancer survival prediction from histology and genomics [16.537929113715432]
We study the use of canonical correlation analysis (CCA) and penalized variants of CCA for the fusion of two modalities.
We analytically show that, with known model parameters, posterior mean estimators that jointly use both modalities outperform arbitrary linear mixing of single modality posterior estimators in latent variable prediction.
arXiv Detail & Related papers (2021-11-27T21:18:01Z) - Entropy Minimizing Matrix Factorization [102.26446204624885]
Nonnegative Matrix Factorization (NMF) is a widely-used data analysis technique, and has yielded impressive results in many real-world tasks.
In this study, an Entropy Minimizing Matrix Factorization framework (EMMF) is developed to tackle the above problem.
Considering that the outliers are usually much less than the normal samples, a new entropy loss function is established for matrix factorization.
arXiv Detail & Related papers (2021-03-24T21:08:43Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Generalized Matrix Factorization: efficient algorithms for fitting
generalized linear latent variable models to large data arrays [62.997667081978825]
Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses.
Current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets.
We propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood.
arXiv Detail & Related papers (2020-10-06T04:28:19Z) - Multilinear Common Component Analysis via Kronecker Product
Representation [0.0]
We consider the problem of extracting a common structure from multiple tensor datasets.
We propose multilinear common component analysis (MCCA) based on Kronecker products of mode-wise covariance matrices.
We develop an estimation algorithm for MCCA that guarantees mode-wise global convergence.
arXiv Detail & Related papers (2020-09-06T10:03:17Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.