Cardiovascular Disease Prediction using Machine Learning: A Comparative Analysis
- URL: http://arxiv.org/abs/2507.21898v1
- Date: Tue, 29 Jul 2025 15:07:32 GMT
- Title: Cardiovascular Disease Prediction using Machine Learning: A Comparative Analysis
- Authors: Risshab Srinivas Ramesh, Roshani T S Udupa, Monisha J, Kushi K K S,
- Abstract summary: This study involves a cardiovascular disease (CVD) dataset comprising 68,119 records.<n>We have performed statistical analyses, including t-tests, Chi-square tests, and ANOVA, to identify strong associations between CVD and elderly people.<n>A logistic regression model highlights age, blood pressure, and cholesterol as primary risk factors, with unexpected negative associations for smoking and alcohol.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cardiovascular diseases (CVDs) are a main cause of mortality globally, accounting for 31% of all deaths. This study involves a cardiovascular disease (CVD) dataset comprising 68,119 records to explore the influence of numerical (age, height, weight, blood pressure, BMI) and categorical gender, cholesterol, glucose, smoking, alcohol, activity) factors on CVD occurrence. We have performed statistical analyses, including t-tests, Chi-square tests, and ANOVA, to identify strong associations between CVD and elderly people, hypertension, higher weight, and abnormal cholesterol levels, while physical activity (a protective factor). A logistic regression model highlights age, blood pressure, and cholesterol as primary risk factors, with unexpected negative associations for smoking and alcohol, suggesting potential data issues. Model performance comparisons reveal CatBoost as the top performer with an accuracy of 0.734 and an ECE of 0.0064 and excels in probabilistic prediction (Brier score = 0.1824). Data challenges, including outliers and skewed distributions, indicate a need for improved preprocessing to enhance predictive reliability.
Related papers
- Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models [70.64969663547703]
AdaCVD is an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank.<n>It addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data.
arXiv Detail & Related papers (2025-05-30T14:42:02Z) - Machine Learning-Based Model for Postoperative Stroke Prediction in Coronary Artery Disease [0.0]
This study aims to develop and evaluate a sophisticated machine learning prediction model to assess postoperative stroke risk.<n>The dataset has 70% training and 30% test. Numerical values were normalized, whereas categorical variables were one-hot encoded.<n> Logistic Regression, XGBoost, SVM, and CatBoost were employed for predictive modeling, and SHAP analysis assessed stroke risk for each variable.
arXiv Detail & Related papers (2025-03-15T02:50:32Z) - Fuzzy Rule based Intelligent Cardiovascular Disease Prediction using Complex Event Processing [0.8668211481067458]
Cardiovascular disease (CVDs) is a rapidly rising global concern due to unhealthy diets, lack of physical activity, and other factors.
Recent research has focused on accurate and timely disease prediction to reduce risk and fatalities.
We propose a fuzzy rule-based system for monitoring clinical data to provide real-time decision support.
arXiv Detail & Related papers (2024-09-19T16:36:24Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Improving Cardiovascular Disease Prediction Through Comparative Analysis
of Machine Learning Models: A Case Study on Myocardial Infarction [0.0]
Cardiovascular disease remains a leading cause of mortality in the contemporary world.
Accurate predictions are pivotal for refining healthcare strategies.
XGBoost emerges as the top-performing model.
arXiv Detail & Related papers (2023-11-01T13:41:44Z) - Interpretable Survival Analysis for Heart Failure Risk Prediction [50.64739292687567]
We propose a novel survival analysis pipeline that is both interpretable and competitive with state-of-the-art survival models.
Our pipeline achieves state-of-the-art performance and provides interesting and novel insights about risk factors for heart failure.
arXiv Detail & Related papers (2023-10-24T02:56:05Z) - Predicting blood pressure under circumstances of missing data: An
analysis of missing data patterns and imputation methods using NHANES [0.0]
CVD is affected by raised blood pressure, raised blood glucose, raised blood lipids, and obesity.
Genetics and social/environmental factors such as poverty, stress, and racism also play an important role.
arXiv Detail & Related papers (2023-05-01T18:15:44Z) - Predicting adverse outcomes following catheter ablation treatment for
atrial fibrillation [2.202746751854349]
We developed prognostic survival models for predicting adverse outcomes after catheter ablation treatment for AF.
Traditional and deep survival models were trained to predict major bleeding events and a composite of heart failure, stroke, cardiac arrest, and death.
arXiv Detail & Related papers (2022-11-22T02:55:51Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Robustness to Spurious Correlations via Human Annotations [100.63051542531171]
We present a framework for making models robust to spurious correlations by leveraging humans' common sense knowledge of causality.
Specifically, we use human annotation to augment each training example with a potential unmeasured variable.
We then introduce a new distributionally robust optimization objective over unmeasured variables (UV-DRO) to control the worst-case loss over possible test-time shifts.
arXiv Detail & Related papers (2020-07-13T20:05:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.