Related papers: Optimizing Stroke Risk Prediction: A Machine Learning Pipeline Combining ROS-Balanced Ensembles and XAI

Optimizing Stroke Risk Prediction: A Machine Learning Pipeline Combining ROS-Balanced Ensembles and XAI

URL: http://arxiv.org/abs/2512.01333v1
Date: Mon, 01 Dec 2025 06:53:00 GMT
Title: Optimizing Stroke Risk Prediction: A Machine Learning Pipeline Combining ROS-Balanced Ensembles and XAI
Authors: A S M Ahsanul Sarkar Akib, Raduana Khawla, Abdul Hasib,
Abstract summary: Stroke is a major cause of death and permanent impairment, making it a major worldwide health concern.<n>To address this challenge, we used ensemble modeling and explainable AI (XAI) techniques to create an interpretable machine learning framework for stroke risk prediction.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stroke is a major cause of death and permanent impairment, making it a major worldwide health concern. For prompt intervention and successful preventative tactics, early risk assessment is essential. To address this challenge, we used ensemble modeling and explainable AI (XAI) techniques to create an interpretable machine learning framework for stroke risk prediction. A thorough evaluation of 10 different machine learning models using 5-fold cross-validation across several datasets was part of our all-inclusive strategy, which also included feature engineering and data pretreatment (using Random Over-Sampling (ROS) to solve class imbalance). Our optimized ensemble model (Random Forest + ExtraTrees + XGBoost) performed exceptionally well, obtaining a strong 99.09% accuracy on the Stroke Prediction Dataset (SPD). We improved the model's transparency and clinical applicability by identifying three important clinical variables using LIME-based interpretability analysis: age, hypertension, and glucose levels. Through early prediction, this study highlights how combining ensemble learning with explainable AI (XAI) can deliver highly accurate and interpretable stroke risk assessment. By enabling data-driven prevention and personalized clinical decisions, our framework has the potential to transform stroke prediction and cardiovascular risk management.

Related papers

Deep Survival Analysis in Multimodal Medical Data: A Parametric and Probabilistic Approach with Competing Risks [47.19194118883552]
We introduce a multimodal deep learning framework for survival analysis capable of modeling both single and competing risks scenarios.<n>We propose SAMVAE (Survival Analysis Multimodal Variational Autoencoder), a novel deep learning architecture designed for survival prediction.
arXiv Detail & Related papers (2025-07-10T14:29:48Z)
Benchmarking Foundation Models and Parameter-Efficient Fine-Tuning for Prognosis Prediction in Medical Imaging [40.35825564674249]
This study introduces the first structured benchmark to assess the robustness and efficiency of transfer learning strategies for Foundation Models.<n>Four publicly available COVID-19 chest X-ray datasets were used, covering mortality, severity, and admission.<n>CNNs pretrained on ImageNet and FMs pretrained on general or biomedical datasets were adapted using full finetuning, linear probing, and parameter-efficient methods.
arXiv Detail & Related papers (2025-06-23T09:16:04Z)
Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models [70.64969663547703]
AdaCVD is an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank.<n>It addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data.
arXiv Detail & Related papers (2025-05-30T14:42:02Z)
Comparative Analysis of Stroke Prediction Models Using Machine Learning [0.0]
Stroke remains one of the most critical global health challenges, ranking as the second leading cause of death and the third leading cause of disability worldwide.<n>This study explores the effectiveness of machine learning algorithms in predicting stroke risk using demographic, clinical, and lifestyle data from the Stroke Prediction dataset.
arXiv Detail & Related papers (2025-05-14T21:27:19Z)
Machine Learning Meets Transparency in Osteoporosis Risk Assessment: A Comparative Study of ML and Explainability Analysis [0.0]
The present research tackles the difficulty of predicting osteoporosis risk via machine learning (ML) approaches.<n>XGBoost had the greatest accuracy (91%) among the evaluated models, surpassing others in precision (0.92), recall (0.91), and F1-score (0.90)<n>The study indicates that age is the primary determinant in forecasting osteoporosis risk, followed by hormonal alterations and familial history.
arXiv Detail & Related papers (2025-05-01T09:05:02Z)
Deciphering Cardiac Destiny: Unveiling Future Risks Through Cutting-Edge Machine Learning Approaches [0.0]
This project aims to develop and assess predictive models for the timely identification of cardiac arrest incidents. We employ machine learning algorithms like XGBoost, Gradient Boosting, and Naive Bayes, alongside a deep learning (DL) approach with Recurrent Neural Networks (RNNs) Rigorous experimentation and validation revealed the superior performance of the RNN model.
arXiv Detail & Related papers (2024-09-03T19:18:16Z)
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z)
Analysis and Evaluation of Explainable Artificial Intelligence on Suicide Risk Assessment [32.04382293817763]
This study investigates the effectiveness of Explainable Artificial Intelligence (XAI) techniques in predicting suicide risks. Data augmentation techniques and ML models are utilized to predict the associated risk. Patients with good incomes, respected occupations, and university education have the least risk.
arXiv Detail & Related papers (2023-03-09T05:11:46Z)
A New Approach for Interpretability and Reliability in Clinical Risk Prediction: Acute Coronary Syndrome Scenario [0.33927193323747895]
We intend to create a new risk assessment methodology that combines the best characteristics of both risk score and machine learning models. The proposed approach achieved testing results identical to the standard LR, but offers superior interpretability and personalization. The reliability estimation of individual predictions presented a great correlation with the misclassifications rate.
arXiv Detail & Related papers (2021-10-15T19:33:46Z)
Bootstrapping Your Own Positive Sample: Contrastive Learning With Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model. We introduce two unique positive sampling strategies specifically tailored for EHR data. Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z)
Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks. Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets. We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.