A comparative study on feature selection for a risk prediction model for
colorectal cancer
- URL: http://arxiv.org/abs/2402.05293v1
- Date: Wed, 7 Feb 2024 22:14:14 GMT
- Title: A comparative study on feature selection for a risk prediction model for
colorectal cancer
- Authors: N. Cueto-L\'opez, M. T. Garc\'ia-Ord\'as, V. D\'avila-Batista, V.
Moreno, N. Aragon\'es, and R. Alaiz-Rodr\'iguez
- Abstract summary: This work is focused on colorectal cancer, assessing several feature ranking algorithms in terms of performance for a set of risk prediction models.
A visual approach proposed in this work allows to see that the Neural Network-based wrapper ranking is the most unstable while the Random Forest is the most stable.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Background and objective
Risk prediction models aim at identifying people at higher risk of developing
a target disease. Feature selection is particularly important to improve the
prediction model performance avoiding overfitting and to identify the leading
cancer risk (and protective) factors. Assessing the stability of feature
selection/ranking algorithms becomes an important issue when the aim is to
analyze the features with more prediction power. Methods
This work is focused on colorectal cancer, assessing several feature ranking
algorithms in terms of performance for a set of risk prediction models (Neural
Networks, Support Vector Machines (SVM), Logistic Regression, k-Nearest
Neighbors and Boosted Trees). Additionally, their robustness is evaluated
following a conventional approach with scalar stability metrics and a visual
approach proposed in this work to study both similarity among feature ranking
techniques as well as their individual stability. A comparative analysis is
carried out between the most relevant features found out in this study and
features provided by the experts according to the state-of-the-art knowledge.
Results
The two best performance results in terms of Area Under the ROC Curve (AUC)
are achieved with a SVM classifier using the top-41 features selected by the
SVM wrapper approach (AUC=0.693) and Logistic Regression with the top-40
features selected by the Pearson (AUC=0.689). Experiments showed that
performing feature selection contributes to classification performance with a
3.9% and 1.9% improvement in AUC for the SVM and Logistic Regression
classifier, respectively, with respect to the results using the full feature
set. The visual approach proposed in this work allows to see that the Neural
Network-based wrapper ranking is the most unstable while the Random Forest is
the most stable.
Related papers
- Electroencephalogram Emotion Recognition via AUC Maximization [0.0]
Imbalanced datasets pose significant challenges in areas including neuroscience, cognitive science, and medical diagnostics.
This study addresses the issue class imbalance, using the Liking' label in the DEAP dataset as an example.
arXiv Detail & Related papers (2024-08-16T19:08:27Z) - Optimizing Disease Prediction with Artificial Intelligence Driven Feature Selection and Attention Networks [0.0]
This article introduces a pioneering ensemble feature selection model.
At the heart of the proposed model lies the SEV-EB algorithm, a novel approach to optimal feature selection.
An HSC-AttentionNet is introduced, allowing the model to capture both short-term patterns and long-term dependencies in health data.
arXiv Detail & Related papers (2024-07-31T14:12:27Z) - Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: A comprehensive analysis [6.796017024594715]
We suggest two novel feature selection (FS) methods based upon an imperialist competitive algorithm (ICA) and a bat algorithm (BA)
This study aims to enhance diagnostic models' efficiency and present a comprehensive analysis to help clinical physicians make much more precise and reliable decisions than before.
arXiv Detail & Related papers (2024-07-19T19:07:53Z) - Confidence-aware Contrastive Learning for Selective Classification [20.573658672018066]
This work provides a generalization bound for selective classification, disclosing that optimizing feature layers helps improve the performance of selective classification.
Inspired by this theory, we propose to explicitly improve the selective classification model at the feature level for the first time, leading to a novel Confidence-aware Contrastive Learning method for Selective Classification, CCL-SC.
arXiv Detail & Related papers (2024-06-07T08:43:53Z) - Uncertainty Quantification on Clinical Trial Outcome Prediction [37.238845949535616]
We propose incorporating uncertainty quantification into clinical trial outcome predictions.
Our main goal is to enhance the model's ability to discern nuanced differences.
We have adopted a selective classification approach to fulfill our objective.
arXiv Detail & Related papers (2024-01-07T13:48:05Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance.
The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z) - Stochastic Optimization of Areas Under Precision-Recall Curves with
Provable Convergence [66.83161885378192]
Area under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems.
We propose a technical method to optimize AUPRC for deep learning.
arXiv Detail & Related papers (2021-04-18T06:22:21Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.