Enhancing Bagging Ensemble Regression with Data Integration for Time Series-Based Diabetes Prediction
- URL: http://arxiv.org/abs/2506.13786v1
- Date: Wed, 11 Jun 2025 04:21:50 GMT
- Title: Enhancing Bagging Ensemble Regression with Data Integration for Time Series-Based Diabetes Prediction
- Authors: Vuong M. Ngo, Tran Quang Vinh, Patricia Kearney, Mark Roantree,
- Abstract summary: This study begins with a data engineering process to integrate diabetes-related datasets from 2011 to 2021.<n>We then introduce an enhanced bagging ensemble regression model (EBMBag+) for time series forecasting to predict diabetes prevalence across U.S. cities.<n>The experimental results demonstrate that EBMBag+ achieved the best performance, with an MAE of 0.41, RMSE of 0.53, MAPE of 4.01, and an R2 of 0.9.
- Score: 0.5399800035598186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diabetes is a chronic metabolic disease characterized by elevated blood glucose levels, leading to complications like heart disease, kidney failure, and nerve damage. Accurate state-level predictions are vital for effective healthcare planning and targeted interventions, but in many cases, data for necessary analyses are incomplete. This study begins with a data engineering process to integrate diabetes-related datasets from 2011 to 2021 to create a comprehensive feature set. We then introduce an enhanced bagging ensemble regression model (EBMBag+) for time series forecasting to predict diabetes prevalence across U.S. cities. Several baseline models, including SVMReg, BDTree, LSBoost, NN, LSTM, and ERMBag, were evaluated for comparison with our EBMBag+ algorithm. The experimental results demonstrate that EBMBag+ achieved the best performance, with an MAE of 0.41, RMSE of 0.53, MAPE of 4.01, and an R2 of 0.9.
Related papers
- Predicting Length of Stay in Neurological ICU Patients Using Classical Machine Learning and Neural Network Models: A Benchmark Study on MIMIC-IV [49.1574468325115]
This study explores multiple ML approaches for predicting LOS in ICU specifically for the patients with neurological diseases based on the MIMIC-IV dataset.<n>The evaluated models include classic ML algorithms (K-Nearest Neighbors, Random Forest, XGBoost and CatBoost) and Neural Networks (LSTM, BERT and Temporal Fusion Transformer)
arXiv Detail & Related papers (2025-05-23T14:06:42Z) - Predicting Diabetes Using Machine Learning: A Comparative Study of Classifiers [0.0]
Diabetes remains a significant health challenge globally, contributing to severe complications like kidney disease, vision loss, and heart issues.<n>Our study introduces an innovative diabetes prediction framework, leveraging both traditional ML techniques and advanced ensemble methods.<n>Central to our approach is the development of a novel model, DNet, a hybrid architecture combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) layers.
arXiv Detail & Related papers (2025-05-11T16:14:31Z) - AttenGluco: Multimodal Transformer-Based Blood Glucose Forecasting on AI-READI Dataset [8.063401183752347]
Diabetes is a chronic metabolic disorder characterized by persistently high blood glucose levels (BGLs)<n>Recent deep learning models show promise in improving BGL prediction.<n>We propose AttenGluco, a multimodal Transformer-based framework for long-term blood glucose prediction.
arXiv Detail & Related papers (2025-02-14T05:07:38Z) - Finetuning and Quantization of EEG-Based Foundational BioSignal Models on ECG and PPG Data for Blood Pressure Estimation [53.2981100111204]
Photoplethysmography and electrocardiography can potentially enable continuous blood pressure (BP) monitoring.<n>Yet accurate and robust machine learning (ML) models remains challenging due to variability in data quality and patient-specific factors.<n>In this work, we investigate whether a model pre-trained on one modality can effectively be exploited to improve the accuracy of a different signal type.<n>Our approach achieves near state-of-the-art accuracy for diastolic BP and surpasses by 1.5x the accuracy of prior works for systolic BP.
arXiv Detail & Related papers (2025-02-10T13:33:12Z) - From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis [47.23780364438969]
We present GluFormer, a generative foundation model for CGM data that learns nuanced glycemic patterns and translates them into predictive representations of metabolic health.<n>GluFormer generalizes to 19 external cohorts spanning different ethnicities and ages, 5 countries, 8 CGM devices, and diverse pathophysiological states.<n>In a longitudinal study of 580 adults with CGM data and 12-year follow-up, GluFormer identifies individuals at elevated risk of developing diabetes more effectively than blood HbA1C%.
arXiv Detail & Related papers (2024-08-20T13:19:06Z) - Comparative Analysis of LSTM Neural Networks and Traditional Machine Learning Models for Predicting Diabetes Patient Readmission [0.0]
This study uses the Diabetes 130-US Hospitals dataset for analysis and prediction of readmission patients by various machine learning models.
LightGBM turned out to be the best traditional model, while XGBoost was the runner-up.
This study demonstrates that model selection, validation, and interpretability are key steps in predictive healthcare modeling.
arXiv Detail & Related papers (2024-06-28T15:06:22Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Machine Learning based prediction of Glucose Levels in Type 1 Diabetes
Patients with the use of Continuous Glucose Monitoring Data [0.0]
Continuous Glucose Monitoring (CGM) devices offer detailed, non-intrusive and real time insights into a patient's blood glucose concentrations.
Leveraging advanced Machine Learning (ML) Models as methods of prediction of future glucose levels, gives rise to substantial quality of life improvements.
arXiv Detail & Related papers (2023-02-24T19:10:40Z) - HealthEdge: A Machine Learning-Based Smart Healthcare Framework for
Prediction of Type 2 Diabetes in an Integrated IoT, Edge, and Cloud Computing
System [0.0]
The alarming increase in diabetes calls for the need to take precautionary measures to avoid/predict the occurrence of diabetes.
This paper proposes HealthEdge, a machine learning-based smart healthcare framework for type 2 diabetes prediction in an integrated IoT-edge-cloud computing system.
arXiv Detail & Related papers (2023-01-25T07:57:18Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Short Term Blood Glucose Prediction based on Continuous Glucose
Monitoring Data [53.01543207478818]
This study explores the use of Continuous Glucose Monitoring (CGM) data as input for digital decision support tools.
We investigate how Recurrent Neural Networks (RNNs) can be used for Short Term Blood Glucose (STBG) prediction.
arXiv Detail & Related papers (2020-02-06T16:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.