ecpc: An R-package for generic co-data models for high-dimensional
prediction
- URL: http://arxiv.org/abs/2205.07640v1
- Date: Mon, 16 May 2022 12:55:19 GMT
- Title: ecpc: An R-package for generic co-data models for high-dimensional
prediction
- Authors: Mirrelijn M. van Nee, Lodewyk F.A. Wessels and Mark A. van de Wiel
- Abstract summary: R-package ecpc originally accommodated various and possibly multiple co-data sources.
We present an extension to the method and software for generic co-data models.
We show how ridge penalties may be transformed to elastic net penalties with the R-package squeezy.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-dimensional prediction considers data with more variables than samples.
Generic research goals are to find the best predictor or to select variables.
Results may be improved by exploiting prior information in the form of co-data,
providing complementary data not on the samples, but on the variables. We
consider adaptive ridge penalised generalised linear and Cox models, in which
the variable specific ridge penalties are adapted to the co-data to give a
priori more weight to more important variables. The R-package ecpc originally
accommodated various and possibly multiple co-data sources, including
categorical co-data, i.e. groups of variables, and continuous co-data.
Continuous co-data, however, was handled by adaptive discretisation,
potentially inefficiently modelling and losing information. Here, we present an
extension to the method and software for generic co-data models, particularly
for continuous co-data. At the basis lies a classical linear regression model,
regressing prior variance weights on the co-data. Co-data variables are then
estimated with empirical Bayes moment estimation. After placing the estimation
procedure in the classical regression framework, extension to generalised
additive and shape constrained co-data models is straightforward. Besides, we
show how ridge penalties may be transformed to elastic net penalties with the
R-package squeezy. In simulation studies we first compare various co-data
models for continuous co-data from the extension to the original method.
Secondly, we compare variable selection performance to other variable selection
methods. Moreover, we demonstrate use of the package in several examples
throughout the paper.
Related papers
- Co-data Learning for Bayesian Additive Regression Trees [0.0]
We propose to incorporate co-data into a sum-of-trees prediction model.
The proposed method can handle multiple types of co-data simultaneously.
Co-data enhances prediction in an application to diffuse large B-cell lymphoma prognosis.
arXiv Detail & Related papers (2023-11-16T16:14:39Z) - Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - A Graphical Model for Fusing Diverse Microbiome Data [2.385985842958366]
We introduce a flexible multinomial-Gaussian generative model for jointly modeling such count data.
We present a computationally scalable variational Expectation-Maximization (EM) algorithm for inferring the latent variables and the parameters of the model.
arXiv Detail & Related papers (2022-08-21T17:54:39Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Conjugate priors for count and rounded data regression [0.0]
We introduce conjugate priors that enable closed-form posterior inference.
Key posterior and predictive functionals are computable analytically or via direct Monte Carlo simulation.
These tools are broadly useful for linear regression, nonlinear models via basis expansions, and model and variable selection.
arXiv Detail & Related papers (2021-10-23T23:26:01Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - High-dimensional regression with potential prior information on variable
importance [0.0]
We propose a simple scheme involving fitting a sequence of models indicated by the ordering.
We show that the computational cost for fitting all models when ridge regression is used is no more than for a single fit of ridge regression.
We describe a strategy for Lasso regression that makes use of previous fits to greatly speed up fitting the entire sequence of models.
arXiv Detail & Related papers (2021-09-23T10:34:37Z) - Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions.
We investigate methods for aggregating any number of conditional quantile models.
All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Flexible co-data learning for high-dimensional prediction [0.0]
Clinical prediction is hard when data is high-dimensional, but additional information, like domain knowledge, may be helpful to improve predictions.
Our method enables exploiting multiple and various co-data sources to improve predictions.
We demonstrate it on two cancer genomics applications and show that it may improve the performance of other dense and parsimonious prognostic models.
arXiv Detail & Related papers (2020-05-08T13:04:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.