Graph Canonical Correlation Analysis
- URL: http://arxiv.org/abs/2502.01780v1
- Date: Mon, 03 Feb 2025 19:41:06 GMT
- Title: Graph Canonical Correlation Analysis
- Authors: Hongju Park, Shuyang Bai, Zhenyao Ye, Hwiyoung Lee, Tianzhou Ma, Shuo Chen,
- Abstract summary: Canonical correlation analysis (CCA) is a widely used technique for estimating associations between two sets of variables.
Recent advancements in CCA methods have expanded their application to decipher the interactions of multiomics datasets.
We propose the graph Canonical Correlation Analysis (gCCA) approach, which calculates canonical correlations based on the graph structure of the cross-correlation matrix.
- Score: 2.588462392029118
- License:
- Abstract: Canonical correlation analysis (CCA) is a widely used technique for estimating associations between two sets of multi-dimensional variables. Recent advancements in CCA methods have expanded their application to decipher the interactions of multiomics datasets, imaging-omics datasets, and more. However, conventional CCA methods are limited in their ability to incorporate structured patterns in the cross-correlation matrix, potentially leading to suboptimal estimations. To address this limitation, we propose the graph Canonical Correlation Analysis (gCCA) approach, which calculates canonical correlations based on the graph structure of the cross-correlation matrix between the two sets of variables. We develop computationally efficient algorithms for gCCA, and provide theoretical results for finite sample analysis of best subset selection and canonical correlation estimation by introducing concentration inequalities and stopping time rule based on martingale theories. Extensive simulations demonstrate that gCCA outperforms competing CCA methods. Additionally, we apply gCCA to a multiomics dataset of DNA methylation and RNA-seq transcriptomics, identifying both positively and negatively regulated gene expression pathways by DNA methylation pathways.
Related papers
- Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - Nearest Neighbor CCP-Based Molecular Sequence Analysis [4.199844472131922]
Correlated Clustering and Projection (CCP) has been proposed as an effective method for biological sequencing data.
We present a Nearest Neighbor Correlated Clustering and Projection (CCP-NN)-based technique for efficiently preprocessing molecular sequence data.
Our findings show that CCP-NN considerably improves classification task accuracy as well as significantly outperforms CCP in terms of computational runtime.
arXiv Detail & Related papers (2024-09-07T22:06:00Z) - Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model.
Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z) - Applications of flow models to the generation of correlated lattice QCD ensembles [69.18453821764075]
Machine-learned normalizing flows can be used in the context of lattice quantum field theory to generate statistically correlated ensembles of lattice gauge fields at different action parameters.
This work demonstrates how these correlations can be exploited for variance reduction in the computation of observables.
arXiv Detail & Related papers (2024-01-19T18:33:52Z) - K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data
Analysis [0.3683202928838613]
We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L$_2,1$ norm regularization.
We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method.
We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse scRNA-seq datasets.
arXiv Detail & Related papers (2023-10-23T03:07:50Z) - Tensor Generalized Canonical Correlation Analysis [0.0]
Generalized Generalized Canonical Correlation Analysis (RGCCA) is a general statistical framework for multi-block data analysis.
This paper presents TGCCA, a new method for analyzing higher-order tensors with admitting an canonical rank-R decomposition.
The efficiency and usefulness of TGCCA are evaluated on simulated and real data and compared favorably to state-of-the-art approaches.
arXiv Detail & Related papers (2023-02-10T14:41:12Z) - Multi-modality fusion using canonical correlation analysis methods:
Application in breast cancer survival prediction from histology and genomics [16.537929113715432]
We study the use of canonical correlation analysis (CCA) and penalized variants of CCA for the fusion of two modalities.
We analytically show that, with known model parameters, posterior mean estimators that jointly use both modalities outperform arbitrary linear mixing of single modality posterior estimators in latent variable prediction.
arXiv Detail & Related papers (2021-11-27T21:18:01Z) - DAGs with No Curl: An Efficient DAG Structure Learning Approach [62.885572432958504]
Recently directed acyclic graph (DAG) structure learning is formulated as a constrained continuous optimization problem with continuous acyclicity constraints.
We propose a novel learning framework to model and learn the weighted adjacency matrices in the DAG space directly.
We show that our method provides comparable accuracy but better efficiency than baseline DAG structure learning methods on both linear and generalized structural equation models.
arXiv Detail & Related papers (2021-06-14T07:11:36Z) - Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals.
We propose a general graph estimator based on a novel structured fusion regularization.
We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z) - Probabilistic Canonical Correlation Analysis for Sparse Count Data [3.1753001245931323]
Canonical correlation analysis is an important technique for exploring the relationship between two sets of continuous variables.
We propose a model-based probabilistic approach for correlation and canonical correlation estimation for two sparse count data sets.
arXiv Detail & Related papers (2020-05-11T02:19:57Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.