Early Prediction of Type 2 Diabetes Using Multimodal data and Tabular Transformers
- URL: http://arxiv.org/abs/2601.12981v1
- Date: Mon, 19 Jan 2026 11:55:41 GMT
- Title: Early Prediction of Type 2 Diabetes Using Multimodal data and Tabular Transformers
- Authors: Sulaiman Khan, Md. Rafiul Biswas, Zubair Shah,
- Abstract summary: We validated our TabTrans model on a retrospective Qatar BioBank cohort of 1,382 subjects.<n>The proposed models performance is evaluated against conventional machine learning (ML) and generative AI models.<n>Our TabTrans model demonstrated superior predictive performance, achieving ROC AUC $geq$ 79.7 % for T2DM prediction.
- Score: 1.2744523252873352
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study introduces a novel approach for early Type 2 Diabetes Mellitus (T2DM) risk prediction using a tabular transformer (TabTrans) architecture to analyze longitudinal patient data. By processing patients` longitudinal health records and bone-related tabular data, our model captures complex, long-range dependencies in disease progression that conventional methods often overlook. We validated our TabTrans model on a retrospective Qatar BioBank (QBB) cohort of 1,382 subjects, comprising 725 men (146 diabetic, 579 healthy) and 657 women (133 diabetic, 524 healthy). The study integrated electronic health records (EHR) with dual-energy X-ray absorptiometry (DXA) data. To address class imbalance, we employed SMOTE and SMOTE-ENN resampling techniques. The proposed model`s performance is evaluated against conventional machine learning (ML) and generative AI models, including Claude 3.5 Sonnet (Anthropic`s constitutional AI), GPT-4 (OpenAI`s generative pre-trained transformer), and Gemini Pro (Google`s multimodal language model). Our TabTrans model demonstrated superior predictive performance, achieving ROC AUC $\geq$ 79.7 % for T2DM prediction compared to both generative AI models and conventional ML approaches. Feature interpretation analysis identified key risk indicators, with visceral adipose tissue (VAT) mass and volume, ward bone mineral density (BMD) and bone mineral content (BMC), T and Z-scores, and L1-L4 scores emerging as the most important predictors associated with diabetes development in Qatari adults. These findings demonstrate the significant potential of TabTrans for analyzing complex tabular healthcare data, providing a powerful tool for proactive T2DM management and personalized clinical interventions in the Qatari population. Index Terms: tabular transformers, multimodal data, DXA data, diabetes, T2DM, feature interpretation, tabular data
Related papers
- impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z) - Enhancing Bagging Ensemble Regression with Data Integration for Time Series-Based Diabetes Prediction [0.5399800035598186]
This study begins with a data engineering process to integrate diabetes-related datasets from 2011 to 2021.<n>We then introduce an enhanced bagging ensemble regression model (EBMBag+) for time series forecasting to predict diabetes prevalence across U.S. cities.<n>The experimental results demonstrate that EBMBag+ achieved the best performance, with an MAE of 0.41, RMSE of 0.53, MAPE of 4.01, and an R2 of 0.9.
arXiv Detail & Related papers (2025-06-11T04:21:50Z) - Finetuning and Quantization of EEG-Based Foundational BioSignal Models on ECG and PPG Data for Blood Pressure Estimation [46.36100528165335]
Photoplethysmography and electrocardiography can potentially enable continuous blood pressure (BP) monitoring.<n>Yet accurate and robust machine learning (ML) models remains challenging due to variability in data quality and patient-specific factors.<n>In this work, we investigate whether a model pre-trained on one modality can effectively be exploited to improve the accuracy of a different signal type.<n>Our approach achieves near state-of-the-art accuracy for diastolic BP and surpasses by 1.5x the accuracy of prior works for systolic BP.
arXiv Detail & Related papers (2025-02-10T13:33:12Z) - Towards Transparent and Accurate Diabetes Prediction Using Machine Learning and Explainable Artificial Intelligence [8.224338294959699]
This study presents a framework for diabetes prediction using Machine Learning (ML) models and XAI tools.<n>The ensemble model provided high accuracy, with a test accuracy of 92.50% and an ROC-AUC of 0.975.<n>The results suggest that ML combined with XAI is a promising means of developing accurate and computationally transparent tools for use in healthcare systems.
arXiv Detail & Related papers (2025-01-30T00:42:43Z) - ScaleMAI: Accelerating the Development of Trusted Datasets and AI Models [46.80682547774335]
We propose ScaleMAI, an agent of AI-integrated data curation and annotation.<n>First, ScaleMAI creates a dataset of 25,362 CT scans, including per-voxel annotations for benign/malignant tumors and 24 anatomical structures.<n>Second, through progressive human-in-the-loop iterations, ScaleMAI provides Flagship AI Model that can approach the proficiency of expert annotators in detecting pancreatic tumors.
arXiv Detail & Related papers (2025-01-06T22:12:00Z) - T2-Only Prostate Cancer Prediction by Meta-Learning from Bi-Parametric MR Imaging [39.64252838533947]
Current imaging-based prostate cancer diagnosis requires both MR T2-weighted (T2w) and diffusion-weighted imaging (DWI) sequences.
measuring diffusion patterns in DWI sequences can be time-consuming, prone to artifacts and sensitive to imaging parameters.
This study investigates the potential of machine learning (ML) methods using only the T2w sequence as input during inference time.
arXiv Detail & Related papers (2024-11-11T22:38:45Z) - Phikon-v2, A large and public feature extractor for biomarker prediction [42.52549987351643]
We train a vision transformer using DINOv2 and publicly release one iteration of this model for further experimentation, coined Phikon-v2.
While trained on publicly available histology slides, Phikon-v2 surpasses our previously released model (Phikon) and performs on par with other histopathology foundation models (FM) trained on proprietary data.
arXiv Detail & Related papers (2024-09-13T20:12:29Z) - First Experiences with the Identification of People at Risk for Diabetes in Argentina using Machine Learning Techniques [0.27488316163114823]
This article describes the development and assessment of predictive models to identify people at risk for T2D and PD specifically in Argentina.
The results obtained show that a very good performance was observed for two datasets with some of these models.
arXiv Detail & Related papers (2024-03-27T14:38:02Z) - SACDNet: Towards Early Type 2 Diabetes Prediction with Uncertainty for
Electronic Health Records [0.951828574518325]
This study proposes a novel neural network architecture for early T2DM prediction using multi-headed self-attention and dense layers.
The proposed technique is called the Self-Attention for Comorbid Disease Net (SACDNet), achieving an accuracy of 89.3% and an F1-Score of 89.1%.
A T2DM prediction dataset is also built as part of this study which is based on real-world routine Electronic Health Record (EHR) data comprising 4,124 diabetic and 181,767 non-diabetic examples.
arXiv Detail & Related papers (2023-01-12T07:14:47Z) - SynthA1c: Towards Clinically Interpretable Patient Representations for
Diabetes Risk Stratification [0.5551483435671848]
Early diagnosis of Type 2 Diabetes Mellitus (T2DM) is crucial to enable timely therapeutic interventions and lifestyle modifications.
We show that image-derived phenotypes and physical examination data together can accurately predict diabetes risk.
arXiv Detail & Related papers (2022-09-20T23:39:52Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.