A Comparative Study of Diabetes Prediction Based on Lifestyle Factors Using Machine Learning
- URL: http://arxiv.org/abs/2503.04137v1
- Date: Thu, 06 Mar 2025 06:31:40 GMT
- Title: A Comparative Study of Diabetes Prediction Based on Lifestyle Factors Using Machine Learning
- Authors: Bruce Nguyen, Yan Zhang,
- Abstract summary: This study explores the use of machine learning models to predict diabetes based on lifestyle factors using data from the Behavioral Risk Factor Surveillance System (BRFSS) 2015 survey.<n>Three classification models, Decision Tree, K-Nearest Neighbors (KNN), and Logistic Regression, are implemented and evaluated to determine their predictive performance.<n>The results indicate that the Decision Tree, KNN, and Logistic Regression achieve an accuracy of 0.74, 0.72, and 0.75, respectively, with varying strengths in precision and recall.
- Score: 2.767257448554864
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Diabetes is a prevalent chronic disease with significant health and economic burdens worldwide. Early prediction and diagnosis can aid in effective management and prevention of complications. This study explores the use of machine learning models to predict diabetes based on lifestyle factors using data from the Behavioral Risk Factor Surveillance System (BRFSS) 2015 survey. The dataset consists of 21 lifestyle and health-related features, capturing aspects such as physical activity, diet, mental health, and socioeconomic status. Three classification models, Decision Tree, K-Nearest Neighbors (KNN), and Logistic Regression, are implemented and evaluated to determine their predictive performance. The models are trained and tested using a balanced dataset, and their performances are assessed based on accuracy, precision, recall, and F1-score. The results indicate that the Decision Tree, KNN, and Logistic Regression achieve an accuracy of 0.74, 0.72, and 0.75, respectively, with varying strengths in precision and recall. The findings highlight the potential of machine learning in diabetes prediction and suggest future improvements through feature selection and ensemble learning techniques.
Related papers
- Chronic Diseases Prediction using Machine Learning and Deep Learning Methods [0.0]
This study explores the application of machine learning (ML) and deep learning (DL) techniques to predict chronic disease and thyroid disorders.
We used a variety of models, including Logistic Regression (LR), Random Forest (RF), Gradient Boosted Trees (GBT), Neural Networks (NN), Decision Trees (DT) and Native Bayes (NB)
The results demonstrated that ensemble methods like Random Forest and Gradient Boosted Trees consistently outperformed.
arXiv Detail & Related papers (2025-04-30T21:08:16Z) - Feature-Enhanced Machine Learning for All-Cause Mortality Prediction in Healthcare Data [0.0]
This study evaluates machine learning models for all-cause in-hospital mortality prediction using the MIMIC-III database.
We extracted key features such as vital signs (e.g., heart rate, blood pressure), laboratory results and demographic information.
The Random Forest model achieved the highest performance with an AUC of 0.94, significantly outperforming other machine learning and deep learning approaches.
arXiv Detail & Related papers (2025-03-27T08:04:42Z) - Towards Transparent and Accurate Diabetes Prediction Using Machine Learning and Explainable Artificial Intelligence [8.224338294959699]
This study presents a framework for diabetes prediction using Machine Learning (ML) models and XAI tools.<n>The ensemble model provided high accuracy, with a test accuracy of 92.50% and an ROC-AUC of 0.975.<n>The results suggest that ML combined with XAI is a promising means of developing accurate and computationally transparent tools for use in healthcare systems.
arXiv Detail & Related papers (2025-01-30T00:42:43Z) - From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis [47.23780364438969]
We present GluFormer, a generative foundation model for CGM data that learns nuanced glycemic patterns and translates them into predictive representations of metabolic health.<n>GluFormer generalizes to 19 external cohorts spanning different ethnicities and ages, 5 countries, 8 CGM devices, and diverse pathophysiological states.<n>In a longitudinal study of 580 adults with CGM data and 12-year follow-up, GluFormer identifies individuals at elevated risk of developing diabetes more effectively than blood HbA1C%.
arXiv Detail & Related papers (2024-08-20T13:19:06Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Foresight -- Deep Generative Modelling of Patient Timelines using
Electronic Health Records [46.024501445093755]
Temporal modelling of medical history can be used to forecast and simulate future events, estimate risk, suggest alternative diagnoses or forecast complications.
We present Foresight, a novel GPT3-based pipeline that uses NER+L tools (i.e. MedCAT) to convert document text into structured, coded concepts.
arXiv Detail & Related papers (2022-12-13T19:06:00Z) - Secure and Privacy-Preserving Automated Machine Learning Operations into
End-to-End Integrated IoT-Edge-Artificial Intelligence-Blockchain Monitoring
System for Diabetes Mellitus Prediction [0.5825410941577593]
This paper proposes an IoT-edge-Artificial Intelligence (AI)-blockchain system for diabetes prediction based on risk factors.
The proposed system is underpinned by the blockchain to obtain a cohesive view of the risk factors data from patients across different hospitals.
Numerical experiments and comparative analysis were carried out between our proposed system, using the most accurate random forest (RF) model.
arXiv Detail & Related papers (2022-11-13T13:57:14Z) - Improvement of a Prediction Model for Heart Failure Survival through
Explainable Artificial Intelligence [0.0]
This work presents an explainability analysis and evaluation of a prediction model for heart failure survival.
The model employs a data workflow pipeline able to select the best ensemble tree algorithm as well as the best feature selection technique.
The paper's main contribution is an explainability-driven approach to select the best prediction model for HF survival based on an accuracy-explainability balance.
arXiv Detail & Related papers (2021-08-20T09:03:26Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z) - Short Term Blood Glucose Prediction based on Continuous Glucose
Monitoring Data [53.01543207478818]
This study explores the use of Continuous Glucose Monitoring (CGM) data as input for digital decision support tools.
We investigate how Recurrent Neural Networks (RNNs) can be used for Short Term Blood Glucose (STBG) prediction.
arXiv Detail & Related papers (2020-02-06T16:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.