Time-to-event prediction for grouped variables using Exclusive Lasso
- URL: http://arxiv.org/abs/2504.01520v1
- Date: Wed, 02 Apr 2025 09:07:05 GMT
- Title: Time-to-event prediction for grouped variables using Exclusive Lasso
- Authors: Dayasri Ravi, Andreas Groll,
- Abstract summary: We propose utilizing Exclusive Lasso regularization in place of standard Lasso penalization.<n>We apply our methodology to a real-life cancer dataset, demonstrating enhanced survival prediction performance compared to the conventional Cox regression model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The integration of high-dimensional genomic data and clinical data into time-to-event prediction models has gained significant attention due to the growing availability of these datasets. Traditionally, a Cox regression model is employed, concatenating various covariate types linearly. Given that much of the data may be redundant or irrelevant, feature selection through penalization is often desirable. A notable characteristic of these datasets is their organization into blocks of distinct data types, such as methylation and clinical predictors, which requires selecting a subset of covariates from each group due to high intra-group correlations. For this reason, we propose utilizing Exclusive Lasso regularization in place of standard Lasso penalization. We apply our methodology to a real-life cancer dataset, demonstrating enhanced survival prediction performance compared to the conventional Cox regression model.
Related papers
- Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Robust self-healing prediction model for high dimensional data [0.685316573653194]
This work proposes a robust self healing (RSH) hybrid prediction model.
It functions by using the data in its entirety by removing errors and inconsistencies from it rather than discarding any data.
The proposed method is compared with some of the existing high performing models and the results are analyzed.
arXiv Detail & Related papers (2022-10-04T17:55:50Z) - ecpc: An R-package for generic co-data models for high-dimensional
prediction [0.0]
R-package ecpc originally accommodated various and possibly multiple co-data sources.
We present an extension to the method and software for generic co-data models.
We show how ridge penalties may be transformed to elastic net penalties with the R-package squeezy.
arXiv Detail & Related papers (2022-05-16T12:55:19Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Fast marginal likelihood estimation of penalties for group-adaptive
elastic net [0.0]
Group-adaptive elastic net penalisation learns from co-data to improve prediction.
We present a fast method for marginal likelihood estimation of group-adaptive elastic net penalties for generalised linear models.
We demonstrate the method in a model-based simulation study and an application to cancer genomics.
arXiv Detail & Related papers (2021-01-11T13:30:24Z) - Two-step penalised logistic regression for multi-omic data with an
application to cardiometabolic syndrome [62.997667081978825]
We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately.
Our approach should be preferred if the goal is to select as many relevant predictors as possible.
Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level.
arXiv Detail & Related papers (2020-08-01T10:36:27Z) - Temporal Phenotyping using Deep Predictive Clustering of Disease
Progression [97.88605060346455]
We develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest.
Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks.
arXiv Detail & Related papers (2020-06-15T20:48:43Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z) - Flexible co-data learning for high-dimensional prediction [0.0]
Clinical prediction is hard when data is high-dimensional, but additional information, like domain knowledge, may be helpful to improve predictions.
Our method enables exploiting multiple and various co-data sources to improve predictions.
We demonstrate it on two cancer genomics applications and show that it may improve the performance of other dense and parsimonious prognostic models.
arXiv Detail & Related papers (2020-05-08T13:04:31Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Large-scale benchmark study of survival prediction methods using
multi-omics data [2.204918347869259]
Questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time.
We aim to give some answers by means of a large-scale benchmark study using real data.
arXiv Detail & Related papers (2020-03-07T18:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.