Bidimensional linked matrix factorization for pan-omics pan-cancer
analysis
- URL: http://arxiv.org/abs/2002.02601v2
- Date: Thu, 7 Apr 2022 15:52:24 GMT
- Title: Bidimensional linked matrix factorization for pan-omics pan-cancer
analysis
- Authors: Eric F. Lock, Jun Young Park, and Katherine A. Hoadley
- Abstract summary: We propose a flexible approach to the simultaneous factorization and decomposition of variation across bidimensionally linked matrices, BIDIFAC+.
This decomposes variation into a series of low-rank components that may be shared across any number of row sets or column sets.
We apply BIDIFAC+ to pan-omics pan-cancer data from TCGA, identifying shared and specific modes of variability across 4 different omics platforms and 29 different cancer types.
- Score: 0.802904964931021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several modern applications require the integration of multiple large data
matrices that have shared rows and/or columns. For example, cancer studies that
integrate multiple omics platforms across multiple types of cancer, pan-omics
pan-cancer analysis, have extended our knowledge of molecular heterogenity
beyond what was observed in single tumor and single platform studies. However,
these studies have been limited by available statistical methodology. We
propose a flexible approach to the simultaneous factorization and decomposition
of variation across such bidimensionally linked matrices, BIDIFAC+. This
decomposes variation into a series of low-rank components that may be shared
across any number of row sets (e.g., omics platforms) or column sets (e.g.,
cancer types). This builds on a growing literature for the factorization and
decomposition of linked matrices, which has primarily focused on multiple
matrices that are linked in one dimension (rows or columns) only. Our objective
function extends nuclear norm penalization, is motivated by random matrix
theory, gives an identifiable decomposition under relatively mild conditions,
and can be shown to give the mode of a Bayesian posterior distribution. We
apply BIDIFAC+ to pan-omics pan-cancer data from TCGA, identifying shared and
specific modes of variability across 4 different omics platforms and 29
different cancer types.
Related papers
- Empirical Bayes Linked Matrix Decomposition [0.0]
We propose an empirical variational Bayesian approach to this problem.
We describe an associated iterative imputation approach that is novel for the single-matrix context.
We show that the method performs very well under different scenarios with respect to recovering underlying low-rank signal.
arXiv Detail & Related papers (2024-08-01T02:13:11Z) - Multiple Augmented Reduced Rank Regression for Pan-Cancer Analysis [0.0]
We propose multiple augmented reduced rank regression (maRRR), a flexible matrix regression and factorization method.
We consider a structured nuclear norm objective that is motivated by random matrix theory.
We apply maRRR to gene expression data from multiple cancer types (i.e., pan-cancer) from TCGA.
arXiv Detail & Related papers (2023-08-30T21:40:58Z) - DEDUCE: Multi-head attention decoupled contrastive learning to discover cancer subtypes based on multi-omics data [7.049723871585993]
We propose a model, named DEDUCE, for unsupervised contrastive learning to analyze multi-omics cancer data.
This model adopts a unsupervised SMAE that can deeply extract contextual features and long-range dependencies from multi-omics data.
Subtypes are clustered by calculating the similarity between samples in both the feature space and sample space of multi-omics data.
arXiv Detail & Related papers (2023-07-09T00:53:23Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - A Graphical Model for Fusing Diverse Microbiome Data [2.385985842958366]
We introduce a flexible multinomial-Gaussian generative model for jointly modeling such count data.
We present a computationally scalable variational Expectation-Maximization (EM) algorithm for inferring the latent variables and the parameters of the model.
arXiv Detail & Related papers (2022-08-21T17:54:39Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z) - Spectral Learning on Matrices and Tensors [74.88243719463053]
We show that tensor decomposition can pick up latent effects that are missed by matrix methods.
We also outline computational techniques to design efficient tensor decomposition methods.
arXiv Detail & Related papers (2020-04-16T22:53:00Z) - Conjoined Dirichlet Process [63.89763375457853]
We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns.
We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
arXiv Detail & Related papers (2020-02-08T19:41:23Z) - D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis
for Multi-view High-dimensional Data [11.184915338554422]
A popular model in high-dimensional multi-view data analysis decomposes each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views.
We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA)
Our D-GCCA takes one step further than generalized canonical correlation analysis by separating common and distinctive components among canonical variables.
arXiv Detail & Related papers (2020-01-09T06:35:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.