Interpretable Heart Disease Prediction via a Weighted Ensemble Model: A Large-Scale Study with SHAP and Surrogate Decision Trees
- URL: http://arxiv.org/abs/2511.01947v1
- Date: Mon, 03 Nov 2025 10:24:09 GMT
- Title: Interpretable Heart Disease Prediction via a Weighted Ensemble Model: A Large-Scale Study with SHAP and Surrogate Decision Trees
- Authors: Md Abrar Hasnat, Md Jobayer, Md. Mehedi Hasan Shawon, Md. Golam Rabiul Alam,
- Abstract summary: Cardiovascular disease (CVD) remains a critical global health concern, demanding reliable and interpretable predictive models for early risk assessment.<n>This study presents a strategically weighted ensemble model that combines tree-based methods (LightGBM, XGBoost) with a Convolutional Neural Network (CNN) to predict CVD risk.<n>The proposed model delivers a combination of robust predictive performance and clinical transparency by blending diverse learning architectures and explainability through SHAP and surrogate decision trees.
- Score: 1.7616014313664448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cardiovascular disease (CVD) remains a critical global health concern, demanding reliable and interpretable predictive models for early risk assessment. This study presents a large-scale analysis using the Heart Disease Health Indicators Dataset, developing a strategically weighted ensemble model that combines tree-based methods (LightGBM, XGBoost) with a Convolutional Neural Network (CNN) to predict CVD risk. The model was trained on a preprocessed dataset of 229,781 patients where the inherent class imbalance was managed through strategic weighting and feature engineering enhanced the original 22 features to 25. The final ensemble achieves a statistically significant improvement over the best individual model, with a Test AUC of 0.8371 (p=0.003) and is particularly suited for screening with a high recall of 80.0%. To provide transparency and clinical interpretability, surrogate decision trees and SHapley Additive exPlanations (SHAP) are used. The proposed model delivers a combination of robust predictive performance and clinical transparency by blending diverse learning architectures and incorporating explainability through SHAP and surrogate decision trees, making it a strong candidate for real-world deployment in public health screening.
Related papers
- Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - Interpretable Fine-Gray Deep Survival Model for Competing Risks: Predicting Post-Discharge Foot Complications for Diabetic Patients in Ontario [0.4001676635647516]
We propose an intrinsically interpretable survival model called CRISPNAM-FG.<n>Our approach predicts the Cumulative Incidence Function using the Fine-Gray formulation, achieving high predictive power.<n>We validated the model on several benchmark datasets and applied our model to predict future foot complications in diabetic patients.
arXiv Detail & Related papers (2025-11-16T01:07:57Z) - Optimizing Clinical Fall Risk Prediction: A Data-Driven Integration of EHR Variables with the Johns Hopkins Fall Risk Assessment Tool [0.059313275041517295]
We aim to better align fall risk prediction from the Johns Hopkins Fall Risk Assessment Tool with clinically meaningful measures.<n>We conducted a retrospective analysis of 54,209 inpatient admissions from three Johns Hopkins Health System hospitals.
arXiv Detail & Related papers (2025-10-23T16:31:09Z) - A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer [54.58205672910646]
RenalCLIP is a visual-language foundation model for characterization, diagnosis and prognosis of renal mass.<n>It achieved better performance and superior generalizability across 10 core tasks spanning the full clinical workflow of kidney cancer.
arXiv Detail & Related papers (2025-08-22T17:48:19Z) - Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models [70.64969663547703]
AdaCVD is an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank.<n>It addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data.
arXiv Detail & Related papers (2025-05-30T14:42:02Z) - Machine Learning Solutions Integrated in an IoT Healthcare Platform for Heart Failure Risk Stratification [0.16863755729554883]
The management of chronic Heart Failure (HF) presents significant challenges in modern healthcare.<n>We present a predictive model founded on Machine Learning (ML) techniques to identify patients at HF risk.
arXiv Detail & Related papers (2025-04-07T14:07:05Z) - Optimizing Disease Prediction with Artificial Intelligence Driven Feature Selection and Attention Networks [0.0]
This article introduces a pioneering ensemble feature selection model.
At the heart of the proposed model lies the SEV-EB algorithm, a novel approach to optimal feature selection.
An HSC-AttentionNet is introduced, allowing the model to capture both short-term patterns and long-term dependencies in health data.
arXiv Detail & Related papers (2024-07-31T14:12:27Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.