An Improved Ensemble-Based Machine Learning Model with Feature Optimization for Early Diabetes Prediction
- URL: http://arxiv.org/abs/2512.02023v1
- Date: Sat, 15 Nov 2025 07:42:31 GMT
- Title: An Improved Ensemble-Based Machine Learning Model with Feature Optimization for Early Diabetes Prediction
- Authors: Md. Najmul Islam, Md. Miner Hossain Rimon, Shah Sadek-E-Akbor Shamim, Zarif Mohaimen Fahad, Md. Jehadul Islam Mony, Md. Jalal Uddin Chowdhury,
- Abstract summary: Diabetes is a serious worldwide health issue, and successful intervention depends on early detection.<n>To use extensive health survey data to create a machine learning framework for diabetes classification that is both accurate and comprehensible.<n>In our study, we proposed and developed a React Native-based application with a Python Flask backend to support early diabetes prediction.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diabetes is a serious worldwide health issue, and successful intervention depends on early detection. However, overlapping risk factors and data asymmetry make prediction difficult. To use extensive health survey data to create a machine learning framework for diabetes classification that is both accurate and comprehensible, to produce results that will aid in clinical decision-making. Using the BRFSS dataset, we assessed a number of supervised learning techniques. SMOTE and Tomek Links were used to correct class imbalance. To improve prediction performance, both individual models and ensemble techniques such as stacking were investigated. The 2015 BRFSS dataset, which includes roughly 253,680 records with 22 numerical features, is used in this study. Strong ROC-AUC performance of approximately 0.96 was attained by the individual models Random Forest, XGBoost, CatBoost, and LightGBM.The stacking ensemble with XGBoost and KNN yielded the best overall results with 94.82\% accuracy, ROC-AUC of 0.989, and PR-AUC of 0.991, indicating a favourable balance between recall and precision. In our study, we proposed and developed a React Native-based application with a Python Flask backend to support early diabetes prediction, providing users with an accessible and efficient health monitoring tool.
Related papers
- Explainable Admission-Level Predictive Modeling for Prolonged Hospital Stay in Elderly Populations: Challenges in Low- and Middle-Income Countries [65.4286079244589]
Prolonged length of stay (pLoS) is a significant factor associated with the risk of adverse in-hospital events.<n>We develop and explain a predictive model for pLos using admission-level patient and hospital administrative data.
arXiv Detail & Related papers (2026-01-07T23:35:24Z) - Validating Vision Transformers for Otoscopy: Performance and Data-Leakage Effects [42.465094107111646]
This study evaluates the efficacy of vision transformer models, specifically Swin transformers, in enhancing the diagnostic accuracy of ear diseases.<n>The research utilised a real-world dataset from the Department of Otolaryngology at the Clinical Hospital of the Universidad de Chile.
arXiv Detail & Related papers (2025-11-06T23:20:37Z) - Predicting Diabetic Retinopathy Using a Two-Level Ensemble Model [0.6445605125467574]
Diabetic retinopathy is a leading cause of blindness in working-age adults.<n>Image-based AI tools have shown limitations in early-stage detection.<n>We propose a non-image-based, two-level ensemble model for DR prediction using routine laboratory test results.
arXiv Detail & Related papers (2025-10-01T16:19:57Z) - Enhancing Bagging Ensemble Regression with Data Integration for Time Series-Based Diabetes Prediction [0.5399800035598186]
This study begins with a data engineering process to integrate diabetes-related datasets from 2011 to 2021.<n>We then introduce an enhanced bagging ensemble regression model (EBMBag+) for time series forecasting to predict diabetes prevalence across U.S. cities.<n>The experimental results demonstrate that EBMBag+ achieved the best performance, with an MAE of 0.41, RMSE of 0.53, MAPE of 4.01, and an R2 of 0.9.
arXiv Detail & Related papers (2025-06-11T04:21:50Z) - A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning [0.0]
We propose a computationally efficient supervised filter that ranks features using the Gumbel copula upper tail dependence coefficient ($lambda_U$)<n>We benchmarked against Mutual Information, mRMR, ReliefF, and $L_1$ Elastic Net across four classifiers on two diabetes datasets.<n>We conclude that copula based feature selection via upper tail dependence is a powerful, efficient, and interpretable approach for building risk models in public health and clinical medicine.
arXiv Detail & Related papers (2025-05-28T16:34:58Z) - Predicting Diabetes Using Machine Learning: A Comparative Study of Classifiers [0.0]
Diabetes remains a significant health challenge globally, contributing to severe complications like kidney disease, vision loss, and heart issues.<n>Our study introduces an innovative diabetes prediction framework, leveraging both traditional ML techniques and advanced ensemble methods.<n>Central to our approach is the development of a novel model, DNet, a hybrid architecture combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) layers.
arXiv Detail & Related papers (2025-05-11T16:14:31Z) - A Comparative Study of Diabetes Prediction Based on Lifestyle Factors Using Machine Learning [2.767257448554864]
This study explores the use of machine learning models to predict diabetes based on lifestyle factors using data from the Behavioral Risk Factor Surveillance System (BRFSS) 2015 survey.<n>Three classification models, Decision Tree, K-Nearest Neighbors (KNN), and Logistic Regression, are implemented and evaluated to determine their predictive performance.<n>The results indicate that the Decision Tree, KNN, and Logistic Regression achieve an accuracy of 0.74, 0.72, and 0.75, respectively, with varying strengths in precision and recall.
arXiv Detail & Related papers (2025-03-06T06:31:40Z) - Towards Transparent and Accurate Diabetes Prediction Using Machine Learning and Explainable Artificial Intelligence [8.224338294959699]
This study presents a framework for diabetes prediction using Machine Learning (ML) models and XAI tools.<n>The ensemble model provided high accuracy, with a test accuracy of 92.50% and an ROC-AUC of 0.975.<n>The results suggest that ML combined with XAI is a promising means of developing accurate and computationally transparent tools for use in healthcare systems.
arXiv Detail & Related papers (2025-01-30T00:42:43Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Learning to diagnose cirrhosis from radiological and histological labels
with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset.
We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis.
This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z) - On the explainability of hospitalization prediction on a large COVID-19
patient dataset [45.82374977939355]
We develop various AI models to predict hospitalization on a large (over 110$k$) cohort of COVID-19 positive-tested US patients.
Despite high data unbalance, the models reach average precision 0.96-0.98 (0.75-0.85), recall 0.96-0.98 (0.74-0.85), and $F_score 0.97-0.98 (0.79-0.83) on the non-hospitalized (or hospitalized) class.
arXiv Detail & Related papers (2021-10-28T10:23:38Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.