Flexible co-data learning for high-dimensional prediction
- URL: http://arxiv.org/abs/2005.04010v1
- Date: Fri, 8 May 2020 13:04:31 GMT
- Title: Flexible co-data learning for high-dimensional prediction
- Authors: Mirrelijn M. van Nee, Lodewyk F.A. Wessels and Mark A. van de Wiel
- Abstract summary: Clinical prediction is hard when data is high-dimensional, but additional information, like domain knowledge, may be helpful to improve predictions.
Our method enables exploiting multiple and various co-data sources to improve predictions.
We demonstrate it on two cancer genomics applications and show that it may improve the performance of other dense and parsimonious prognostic models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinical research often focuses on complex traits in which many variables
play a role in mechanisms driving, or curing, diseases. Clinical prediction is
hard when data is high-dimensional, but additional information, like domain
knowledge and previously published studies, may be helpful to improve
predictions. Such complementary data, or co-data, provide information on the
covariates, such as genomic location or p-values from external studies. Our
method enables exploiting multiple and various co-data sources to improve
predictions. We use discrete or continuous co-data to define possibly
overlapping or hierarchically structured groups of covariates. These are then
used to estimate adaptive multi-group ridge penalties for generalised linear
and Cox models. We combine empirical Bayes estimation of group penalty
hyperparameters with an extra level of shrinkage. This renders a uniquely
flexible framework as any type of shrinkage can be used on the group level. The
hyperparameter shrinkage learns how relevant a specific co-data source is,
counters overfitting of hyperparameters for many groups, and accounts for
structured co-data. We describe various types of co-data and propose suitable
forms of hypershrinkage. The method is very versatile, as it allows for
integration and weighting of multiple co-data sets, inclusion of unpenalised
covariates and posterior variable selection. We demonstrate it on two cancer
genomics applications and show that it may improve the performance of other
dense and parsimonious prognostic models substantially, and stabilises variable
selection.
Related papers
- Guiding adaptive shrinkage by co-data to improve regression-based prediction and feature selection [0.3867363075280544]
It is widely recognized that complementary data on the features, co-data', may improve results.
Such co-data are ubiquitous in genomics settings due to the availability of public repositories.
We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters.
arXiv Detail & Related papers (2024-05-08T09:38:11Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Composite Feature Selection using Deep Ensembles [130.72015919510605]
We investigate the problem of discovering groups of predictive features without predefined grouping.
We introduce a novel deep learning architecture that uses an ensemble of feature selection models to find predictive groups.
We propose a new metric to measure similarity between discovered groups and the ground truth.
arXiv Detail & Related papers (2022-11-01T17:49:40Z) - ecpc: An R-package for generic co-data models for high-dimensional
prediction [0.0]
R-package ecpc originally accommodated various and possibly multiple co-data sources.
We present an extension to the method and software for generic co-data models.
We show how ridge penalties may be transformed to elastic net penalties with the R-package squeezy.
arXiv Detail & Related papers (2022-05-16T12:55:19Z) - Scalable Regularised Joint Mixture Models [2.0686407686198263]
In many applications, data can be heterogeneous in the sense of spanning latent groups with different underlying distributions.
We propose an approach for heterogeneous data that allows joint learning of (i) explicit multivariate feature distributions, (ii) high-dimensional regression models and (iii) latent group labels.
The approach is demonstrably effective in high dimensions, combining data reduction for computational efficiency with a re-weighting scheme that retains key signals even when the number of features is large.
arXiv Detail & Related papers (2022-05-03T13:38:58Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Multi-modality fusion using canonical correlation analysis methods:
Application in breast cancer survival prediction from histology and genomics [16.537929113715432]
We study the use of canonical correlation analysis (CCA) and penalized variants of CCA for the fusion of two modalities.
We analytically show that, with known model parameters, posterior mean estimators that jointly use both modalities outperform arbitrary linear mixing of single modality posterior estimators in latent variable prediction.
arXiv Detail & Related papers (2021-11-27T21:18:01Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z) - Meta-analysis of heterogeneous data: integrative sparse regression in
high-dimensions [21.162280861396205]
We consider the task of meta-analysis in high-dimensional settings in which the data sources are similar but non-identical.
We introduce a global parameter that emphasizes interpretability and statistical efficiency in the presence of heterogeneity.
We demonstrate the benefits of our approach on a large-scale drug treatment dataset involving several different cancer cell-lines.
arXiv Detail & Related papers (2019-12-26T20:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.