Presenting DiaData for Research on Type 1 Diabetes
- URL: http://arxiv.org/abs/2508.09160v1
- Date: Tue, 05 Aug 2025 14:29:43 GMT
- Title: Presenting DiaData for Research on Type 1 Diabetes
- Authors: Beyza Cinar, Maria Maleshkova,
- Abstract summary: This work systematically integrates 15 datasets to provide a large database of 2510 subjects with glucose measurements recorded every 5 minutes.<n>In total, 149 million measurements are included, of which 4% represent values in the hypoglycemic range.<n>The integrated dataset provides an equal distribution of sex and different age levels.
- Score: 0.5250756556290761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Type 1 diabetes (T1D) is an autoimmune disorder that leads to the destruction of insulin-producing cells, resulting in insulin deficiency, as to why the affected individuals depend on external insulin injections. However, insulin can decrease blood glucose levels and can cause hypoglycemia. Hypoglycemia is a severe event of low blood glucose levels ($\le$70 mg/dL) with dangerous side effects of dizziness, coma, or death. Data analysis can significantly enhance diabetes care by identifying personal patterns and trends leading to adverse events. Especially, machine learning (ML) models can predict glucose levels and provide early alarms. However, diabetes and hypoglycemia research is limited by the unavailability of large datasets. Thus, this work systematically integrates 15 datasets to provide a large database of 2510 subjects with glucose measurements recorded every 5 minutes. In total, 149 million measurements are included, of which 4% represent values in the hypoglycemic range. Moreover, two sub-databases are extracted. Sub-database I includes demographics, and sub-database II includes heart rate data. The integrated dataset provides an equal distribution of sex and different age levels. As a further contribution, data quality is assessed, revealing that data imbalance and missing values present a significant challenge. Moreover, a correlation study on glucose levels and heart rate data is conducted, showing a relation between 15 and 55 minutes before hypoglycemia.
Related papers
- A Review on Machine Learning Approaches for the Prediction of Glucose Levels and Hypogylcemia [0.23031174164121124]
Machine learning (ML) models can improve diabetes management by predicting hypoglycemia and providing optimal prevention methods.<n>This review investigates state-of-the-art models trained on data of continuous glucose monitoring (CGM) devices from patients with Type 1 Diabetes (T1D)<n>We compare the models' performance across short-term (15 to 120 min) and long term (3 to more than 24 hours) prediction horizons (PHs)
arXiv Detail & Related papers (2026-01-09T23:06:36Z) - Integrating Neural Differential Forecasting with Safe Reinforcement Learning for Blood Glucose Regulation [51.12307713554633]
TSODE is a safety-aware controller that integrates Thompson RL Sampling with a Neural Ordinary Differential Equation forecaster.<n>In the FDA-approved UVa/Padova simulator (adult cohort), TSODE achieved 87.9% time-in-range with less than 10% time below 70 mg/dL.
arXiv Detail & Related papers (2025-11-16T02:11:33Z) - Benchmarking ResNet for Short-Term Hypoglycemia Classification with DiaData [0.25884731694338026]
This study improves the data quality of DiaData, an integration of 15 separate datasets containing glucose values from 2510 subjects with T1D.<n>Training with more data improves performance by 7% while using quality-refined data yields a 2-3% gain compared to raw data.
arXiv Detail & Related papers (2025-10-26T18:29:16Z) - Insulin Resistance Prediction From Wearables and Routine Blood Biomarkers [44.735350554750916]
We remotely recruited the largest dataset to date across the US to study insulin resistance.<n>Deep neural network models were developed to predict insulin resistance based on readily available digital and blood biomarkers.<n>Model showed 93% sensitivity and 95% adjusted specificity in obese and sedentary participants.
arXiv Detail & Related papers (2025-04-30T16:10:20Z) - LLM-Powered Prediction of Hyperglycemia and Discovery of Behavioral Treatment Pathways from Wearables and Diet [6.292642131180376]
Postprandial hyperglycemia is marked by the blood glucose level exceeding the normal range after consuming a meal.<n>A key metric for understanding blood glucose dynamics after eating is the postprandial area under the curve (AUC)<n>We developed an explainable machine learning solution, GlucoLens, that takes sensor-driven inputs and uses advanced data processing, large language models, and trainable machine learning models.
arXiv Detail & Related papers (2025-03-05T22:10:14Z) - Blood Glucose Level Prediction in Type 1 Diabetes Using Machine Learning [0.0]
People with Type 1 Diabetes need synthetic insulin every day.<n>In diabetes management, continuous glucose monitoring is an important feature that provides near real-time blood glucose data.<n>In this research work, we used machine learning tools, deep neural networks, deep reinforcement learning, and voting and stacking regressors to predict blood glucose levels.
arXiv Detail & Related papers (2025-01-30T21:00:38Z) - Chronic Disease Diagnoses Using Behavioral Data [42.96592744768303]
We aim to diagnose hyperglycemia (diabetes), hyperlipidemia, and hypertension (collectively known as 3H) using own collected behavioral data.
arXiv Detail & Related papers (2024-10-04T12:52:49Z) - From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis [47.23780364438969]
We present GluFormer, a generative foundation model for CGM data that learns nuanced glycemic patterns and translates them into predictive representations of metabolic health.<n>GluFormer generalizes to 19 external cohorts spanning different ethnicities and ages, 5 countries, 8 CGM devices, and diverse pathophysiological states.<n>In a longitudinal study of 580 adults with CGM data and 12-year follow-up, GluFormer identifies individuals at elevated risk of developing diabetes more effectively than blood HbA1C%.
arXiv Detail & Related papers (2024-08-20T13:19:06Z) - Temporal patterns in insulin needs for Type 1 diabetes [0.0]
Type 1 Diabetes (T1D) is a chronic condition where the body produces little or no insulin.
Finding the right insulin dose and time remains a complex, challenging and as yet unsolved control task.
In this study, we use the OpenAPS Data Commons dataset to discover temporal patterns in insulin need driven by well-known factors.
arXiv Detail & Related papers (2022-11-14T14:19:50Z) - Task-wise Split Gradient Boosting Trees for Multi-center Diabetes
Prediction [37.846368153741395]
Task-wise Split Gradient Boosting Trees (TSGB) is proposed for the multi-center diabetes prediction task.
TSGB achieves superior performance against several state-of-the-art methods.
The proposed TSGB method has been deployed as an online diabetes risk assessment software for early diagnosis.
arXiv Detail & Related papers (2021-08-16T14:22:44Z) - A Benchmark for Studying Diabetic Retinopathy: Segmentation, Grading,
and Transferability [76.64661091980531]
People with diabetes are at risk of developing diabetic retinopathy (DR)
Computer-aided DR diagnosis is a promising tool for early detection of DR and severity grading.
This dataset has 1,842 images with pixel-level DR-related lesion annotations, and 1,000 images with image-level labels graded by six board-certified ophthalmologists.
arXiv Detail & Related papers (2020-08-22T07:48:04Z) - Machine learning for the diagnosis of early stage diabetes using
temporal glucose profiles [0.20072624123275526]
Diabetes is a chronic disease that has a long latent period that complicates detection of the disease at an early stage.
We propose to use machine learning to detect the subtle change in the temporal pattern of glucose concentration.
Multi-layered perceptrons, convolutional neural networks, and recurrent neural networks all identified the degree of insulin resistance with high accuracy above $85%$.
arXiv Detail & Related papers (2020-05-18T13:31:12Z) - Short Term Blood Glucose Prediction based on Continuous Glucose
Monitoring Data [53.01543207478818]
This study explores the use of Continuous Glucose Monitoring (CGM) data as input for digital decision support tools.
We investigate how Recurrent Neural Networks (RNNs) can be used for Short Term Blood Glucose (STBG) prediction.
arXiv Detail & Related papers (2020-02-06T16:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.