Related papers: A comparative study on feature selection for a risk prediction model for colorectal cancer

A comparative study on feature selection for a risk prediction model for colorectal cancer

URL: http://arxiv.org/abs/2402.05293v1
Date: Wed, 7 Feb 2024 22:14:14 GMT
Title: A comparative study on feature selection for a risk prediction model for colorectal cancer
Authors: N. Cueto-L\'opez, M. T. Garc\'ia-Ord\'as, V. D\'avila-Batista, V. Moreno, N. Aragon\'es, and R. Alaiz-Rodr\'iguez
Abstract summary: This work is focused on colorectal cancer, assessing several feature ranking algorithms in terms of performance for a set of risk prediction models. A visual approach proposed in this work allows to see that the Neural Network-based wrapper ranking is the most unstable while the Random Forest is the most stable.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Background and objective Risk prediction models aim at identifying people at higher risk of developing a target disease. Feature selection is particularly important to improve the prediction model performance avoiding overfitting and to identify the leading cancer risk (and protective) factors. Assessing the stability of feature selection/ranking algorithms becomes an important issue when the aim is to analyze the features with more prediction power. Methods This work is focused on colorectal cancer, assessing several feature ranking algorithms in terms of performance for a set of risk prediction models (Neural Networks, Support Vector Machines (SVM), Logistic Regression, k-Nearest Neighbors and Boosted Trees). Additionally, their robustness is evaluated following a conventional approach with scalar stability metrics and a visual approach proposed in this work to study both similarity among feature ranking techniques as well as their individual stability. A comparative analysis is carried out between the most relevant features found out in this study and features provided by the experts according to the state-of-the-art knowledge. Results The two best performance results in terms of Area Under the ROC Curve (AUC) are achieved with a SVM classifier using the top-41 features selected by the SVM wrapper approach (AUC=0.693) and Logistic Regression with the top-40 features selected by the Pearson (AUC=0.689). Experiments showed that performing feature selection contributes to classification performance with a 3.9% and 1.9% improvement in AUC for the SVM and Logistic Regression classifier, respectively, with respect to the results using the full feature set. The visual approach proposed in this work allows to see that the Neural Network-based wrapper ranking is the most unstable while the Random Forest is the most stable.

Related papers

Machine learning algorithms to predict stroke in China based on causal inference of time series analysis [1.7646715816998508]
This study proposes a stroke risk prediction method that combines dynamic causal inference with machine learning models. The research results indicate that dynamic causal inference features have important value in predicting stroke risk.
arXiv Detail & Related papers (2025-03-10T14:45:43Z)
A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy. We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods. By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z)
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework [77.45983464131977]
We focus on how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications. Our research identifies two critical latent factors affecting RAG's confidence in its predictions. We develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers.
arXiv Detail & Related papers (2024-09-24T14:52:14Z)
Electroencephalogram Emotion Recognition via AUC Maximization [0.0]
Imbalanced datasets pose significant challenges in areas including neuroscience, cognitive science, and medical diagnostics. This study addresses the issue class imbalance, using the Liking' label in the DEAP dataset as an example.
arXiv Detail & Related papers (2024-08-16T19:08:27Z)
Optimizing Disease Prediction with Artificial Intelligence Driven Feature Selection and Attention Networks [0.0]
This article introduces a pioneering ensemble feature selection model. At the heart of the proposed model lies the SEV-EB algorithm, a novel approach to optimal feature selection. An HSC-AttentionNet is introduced, allowing the model to capture both short-term patterns and long-term dependencies in health data.
arXiv Detail & Related papers (2024-07-31T14:12:27Z)
Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: A comprehensive analysis [6.796017024594715]
We suggest two novel feature selection (FS) methods based upon an imperialist competitive algorithm (ICA) and a bat algorithm (BA) This study aims to enhance diagnostic models' efficiency and present a comprehensive analysis to help clinical physicians make much more precise and reliable decisions than before.
arXiv Detail & Related papers (2024-07-19T19:07:53Z)
Confidence-aware Contrastive Learning for Selective Classification [20.573658672018066]
This work provides a generalization bound for selective classification, disclosing that optimizing feature layers helps improve the performance of selective classification. Inspired by this theory, we propose to explicitly improve the selective classification model at the feature level for the first time, leading to a novel Confidence-aware Contrastive Learning method for Selective Classification, CCL-SC.
arXiv Detail & Related papers (2024-06-07T08:43:53Z)
Uncertainty Quantification on Clinical Trial Outcome Prediction [37.238845949535616]
We propose incorporating uncertainty quantification into clinical trial outcome predictions. Our main goal is to enhance the model's ability to discern nuanced differences. We have adopted a selective classification approach to fulfill our objective.
arXiv Detail & Related papers (2024-01-07T13:48:05Z)
Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data. We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures. We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z)
Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance. The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z)
Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence [66.83161885378192]
Area under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems. We propose a technical method to optimize AUPRC for deep learning.
arXiv Detail & Related papers (2021-04-18T06:22:21Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
Characterizing Fairness Over the Set of Good Models Under Selective Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance. We provide tractable algorithms to compute the range of attainable group-level predictive disparities. We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.