Multiple Augmented Reduced Rank Regression for Pan-Cancer Analysis
- URL: http://arxiv.org/abs/2308.16333v1
- Date: Wed, 30 Aug 2023 21:40:58 GMT
- Title: Multiple Augmented Reduced Rank Regression for Pan-Cancer Analysis
- Authors: Jiuzhou Wang and Eric F. Lock
- Abstract summary: We propose multiple augmented reduced rank regression (maRRR), a flexible matrix regression and factorization method.
We consider a structured nuclear norm objective that is motivated by random matrix theory.
We apply maRRR to gene expression data from multiple cancer types (i.e., pan-cancer) from TCGA.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Statistical approaches that successfully combine multiple datasets are more
powerful, efficient, and scientifically informative than separate analyses. To
address variation architectures correctly and comprehensively for
high-dimensional data across multiple sample sets (i.e., cohorts), we propose
multiple augmented reduced rank regression (maRRR), a flexible matrix
regression and factorization method to concurrently learn both covariate-driven
and auxiliary structured variation. We consider a structured nuclear norm
objective that is motivated by random matrix theory, in which the regression or
factorization terms may be shared or specific to any number of cohorts. Our
framework subsumes several existing methods, such as reduced rank regression
and unsupervised multi-matrix factorization approaches, and includes a
promising novel approach to regression and factorization of a single dataset
(aRRR) as a special case. Simulations demonstrate substantial gains in power
from combining multiple datasets, and from parsimoniously accounting for all
structured variation. We apply maRRR to gene expression data from multiple
cancer types (i.e., pan-cancer) from TCGA, with somatic mutations as
covariates. The method performs well with respect to prediction and imputation
of held-out data, and provides new insights into mutation-driven and auxiliary
variation that is shared or specific to certain cancer types.
Related papers
- Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - Empirical Bayes Linked Matrix Decomposition [0.0]
We propose an empirical variational Bayesian approach to this problem.
We describe an associated iterative imputation approach that is novel for the single-matrix context.
We show that the method performs very well under different scenarios with respect to recovering underlying low-rank signal.
arXiv Detail & Related papers (2024-08-01T02:13:11Z) - Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in
Transformer-Based Variational AutoEncoder for Diverse Text Generation [85.5379146125199]
Variational Auto-Encoder (VAE) has been widely adopted in text generation.
We propose TRACE, a Transformer-based recurrent VAE structure.
arXiv Detail & Related papers (2022-10-22T10:25:35Z) - Machine Learning for Multi-Output Regression: When should a holistic
multivariate approach be preferred over separate univariate ones? [62.997667081978825]
Tree-based ensembles such as the Random Forest are modern classics among statistical learning methods.
We compare these methods in extensive simulations to help in answering the primary question when to use multivariate ensemble techniques.
arXiv Detail & Related papers (2022-01-14T08:44:25Z) - High-dimensional multi-trait GWAS by reverse prediction of genotypes [3.441021278275805]
Reverse regression is a promising approach to perform multi-trait GWAS in high-dimensional settings.
We analyzed different machine learning methods for reverse regression in multi-trait GWAS.
Model feature coefficients correlated with the strength of association between variants and individual traits, and were predictive of true trans-eQTL target genes.
arXiv Detail & Related papers (2021-10-29T22:34:35Z) - Nonparametric Trace Regression in High Dimensions via Sign Series
Representation [13.37650464374017]
We develop a framework for nonparametric trace regression models via structured sign series representations of high dimensional functions.
In the context of matrix completion, our framework leads to a substantially richer model based on what we coin as the "sign rank" of a matrix.
arXiv Detail & Related papers (2021-05-04T22:20:00Z) - Generalized Matrix Factorization: efficient algorithms for fitting
generalized linear latent variable models to large data arrays [62.997667081978825]
Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses.
Current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets.
We propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood.
arXiv Detail & Related papers (2020-10-06T04:28:19Z) - Two-step penalised logistic regression for multi-omic data with an
application to cardiometabolic syndrome [62.997667081978825]
We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately.
Our approach should be preferred if the goal is to select as many relevant predictors as possible.
Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level.
arXiv Detail & Related papers (2020-08-01T10:36:27Z) - Fast cross-validation for multi-penalty ridge regression [0.0]
Ridge regression is a simple model for high-dimensional data.
Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix.
Extensions to paired and preferential data types are included and illustrated on several cancer genomics survival prediction problems.
arXiv Detail & Related papers (2020-05-19T09:13:43Z) - Towards Multimodal Response Generation with Exemplar Augmentation and
Curriculum Optimization [73.45742420178196]
We propose a novel multimodal response generation framework with exemplar augmentation and curriculum optimization.
Our model achieves significant improvements compared to strong baselines in terms of diversity and relevance.
arXiv Detail & Related papers (2020-04-26T16:29:06Z) - D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis
for Multi-view High-dimensional Data [11.184915338554422]
A popular model in high-dimensional multi-view data analysis decomposes each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views.
We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA)
Our D-GCCA takes one step further than generalized canonical correlation analysis by separating common and distinctive components among canonical variables.
arXiv Detail & Related papers (2020-01-09T06:35:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.