Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutions
- URL: http://arxiv.org/abs/2507.14077v1
- Date: Fri, 18 Jul 2025 16:53:05 GMT
- Title: Glucose-ML: A collection of longitudinal diabetes datasets for development of robust AI solutions
- Authors: Temiloluwa Prioleau, Baiying Lu, Yanjun Cui,
- Abstract summary: Glucose-ML is a collection of 10 publicly available diabetes datasets.<n>The Glucose-ML collection comprises over 300,000 days of continuous glucose monitor (CGM) data.<n>We conduct a case study for the task of blood glucose prediction.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial intelligence (AI) algorithms are a critical part of state-of-the-art digital health technology for diabetes management. Yet, access to large high-quality datasets is creating barriers that impede development of robust AI solutions. To accelerate development of transparent, reproducible, and robust AI solutions, we present Glucose-ML, a collection of 10 publicly available diabetes datasets, released within the last 7 years (i.e., 2018 - 2025). The Glucose-ML collection comprises over 300,000 days of continuous glucose monitor (CGM) data with a total of 38 million glucose samples collected from 2500+ people across 4 countries. Participants include persons living with type 1 diabetes, type 2 diabetes, prediabetes, and no diabetes. To support researchers and innovators with using this rich collection of diabetes datasets, we present a comparative analysis to guide algorithm developers with data selection. Additionally, we conduct a case study for the task of blood glucose prediction - one of the most common AI tasks within the field. Through this case study, we provide a benchmark for short-term blood glucose prediction across all 10 publicly available diabetes datasets within the Glucose-ML collection. We show that the same algorithm can have significantly different prediction results when developed/evaluated with different datasets. Findings from this study are then used to inform recommendations for developing robust AI solutions within the diabetes or broader health domain. We provide direct links to each longitudinal diabetes dataset in the Glucose-ML collection and openly provide our code.
Related papers
- Insulin Resistance Prediction From Wearables and Routine Blood Biomarkers [44.735350554750916]
We remotely recruited the largest dataset to date across the US to study insulin resistance.<n>Deep neural network models were developed to predict insulin resistance based on readily available digital and blood biomarkers.<n>Model showed 93% sensitivity and 95% adjusted specificity in obese and sedentary participants.
arXiv Detail & Related papers (2025-04-30T16:10:20Z) - A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI [70.06771291117965]
We introduce Biomedica, an open-source dataset derived from the PubMed Central Open Access subset.<n>Biomedica contains over 6 million scientific articles and 24 million image-text pairs.<n>We provide scalable streaming and search APIs through a web server, facilitating seamless integration with AI systems.
arXiv Detail & Related papers (2025-03-26T05:56:46Z) - Let Curves Speak: A Continuous Glucose Monitor based Large Sensor Foundation Model for Diabetes Management [3.8195320624847833]
Integrating AI with continuous glucose monitoring holds promise for near-future glucose prediction.<n>CGM-LSM is pretrained on 15.96 million glucose records from 592 diabetes patients for near-future glucose prediction.<n>LSM achieved exceptional performance, with an rMSE of 29.81 mg/dL for type 1 diabetes patients and 23.49 mg/dL for type 2 diabetes patients in a two-hour prediction horizon.
arXiv Detail & Related papers (2024-12-12T21:35:13Z) - Diabetica: Adapting Large Language Model to Enhance Multiple Medical Tasks in Diabetes Care and Management [25.56300105245314]
Large language models (LLMs) have shown promise in various healthcare scenarios, but their effectiveness across diverse diabetes tasks remains unproven.<n>Our study introduced a framework to train and validate diabetes-specific LLMs.
arXiv Detail & Related papers (2024-09-20T03:47:54Z) - From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis [47.23780364438969]
We present GluFormer, a generative foundation model for CGM data that learns nuanced glycemic patterns and translates them into predictive representations of metabolic health.<n>GluFormer generalizes to 19 external cohorts spanning different ethnicities and ages, 5 countries, 8 CGM devices, and diverse pathophysiological states.<n>In a longitudinal study of 580 adults with CGM data and 12-year follow-up, GluFormer identifies individuals at elevated risk of developing diabetes more effectively than blood HbA1C%.
arXiv Detail & Related papers (2024-08-20T13:19:06Z) - Toward Short-Term Glucose Prediction Solely Based on CGM Time Series [4.7066018521459725]
TimeGlu is an end-to-end pipeline for short-term glucose prediction based on CGM time series data.
It achieves state-of-the-art performance without the need for additional personal data from patients.
arXiv Detail & Related papers (2024-04-18T06:02:12Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical
Coherence Tomography Angiography Images [51.27125547308154]
We organized a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022)
The challenge consists of three tasks: segmentation of DR lesions, image quality assessment and DR grading.
This paper presents a summary and analysis of the top-performing solutions and results for each task of the challenge.
arXiv Detail & Related papers (2023-04-05T12:04:55Z) - DiaTrend: A dataset from advanced diabetes technology to enable
development of novel analytic solutions [0.0]
This dataset is composed of intensive longitudinal data from wearable medical devices, including a total of 27,561 days of continuous glucose monitor data and 8,220 days of insulin pump data from 54 patients with diabetes.
This dataset is useful for developing novel analytic solutions that can reduce the disease burden for people living with diabetes and increase knowledge on chronic condition management in outpatient settings.
arXiv Detail & Related papers (2023-04-04T00:59:04Z) - Personalized Understanding of Blood Glucose Dynamics via Mobile Sensor
Data [0.0]
We augment Continuous Blood Glucose (CGM) data with sensor input collected by a smart phone.
This data set is novel in terms of it's size, the inclusion of GPS data, and the fact that it was collected non-intrusively from a free-living patient.
arXiv Detail & Related papers (2023-02-02T20:26:05Z) - Continuous Glucose Monitoring Prediction [0.0]
Diabetes is one of the deadliest diseases in the world and affects nearly 10 percent of the global adult population.
One major development is a system called continuous blood glucose monitoring (CGM)
arXiv Detail & Related papers (2021-01-04T21:32:20Z) - Short Term Blood Glucose Prediction based on Continuous Glucose
Monitoring Data [53.01543207478818]
This study explores the use of Continuous Glucose Monitoring (CGM) data as input for digital decision support tools.
We investigate how Recurrent Neural Networks (RNNs) can be used for Short Term Blood Glucose (STBG) prediction.
arXiv Detail & Related papers (2020-02-06T16:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.