Related papers: From Chat to Checkup: Can Large Language Models Assist in Diabetes Prediction?

From Chat to Checkup: Can Large Language Models Assist in Diabetes Prediction?

URL: http://arxiv.org/abs/2506.14949v1
Date: Tue, 17 Jun 2025 20:00:16 GMT
Title: From Chat to Checkup: Can Large Language Models Assist in Diabetes Prediction?
Authors: Shadman Sakib, Oishy Fatema Akhand, Ajwad Abrar,
Abstract summary: We test the effectiveness of Large Language Models (LLMs) in predicting diabetes using zero-shot, one-shot, and three-shot prompting methods.<n>We evaluate six LLMs, including four open-source models: Gemma-2-27B, Mistral-7B, Llama-3.1-8B, and Llama-3.2-2B.<n>Our results show that proprietary LLMs perform better than open-source ones, with GPT-4o and Gemma-2-27B achieving the highest accuracy in few-shot settings.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: While Machine Learning (ML) and Deep Learning (DL) models have been widely used for diabetes prediction, the use of Large Language Models (LLMs) for structured numerical data is still not well explored. In this study, we test the effectiveness of LLMs in predicting diabetes using zero-shot, one-shot, and three-shot prompting methods. We conduct an empirical analysis using the Pima Indian Diabetes Database (PIDD). We evaluate six LLMs, including four open-source models: Gemma-2-27B, Mistral-7B, Llama-3.1-8B, and Llama-3.2-2B. We also test two proprietary models: GPT-4o and Gemini Flash 2.0. In addition, we compare their performance with three traditional machine learning models: Random Forest, Logistic Regression, and Support Vector Machine (SVM). We use accuracy, precision, recall, and F1-score as evaluation metrics. Our results show that proprietary LLMs perform better than open-source ones, with GPT-4o and Gemma-2-27B achieving the highest accuracy in few-shot settings. Notably, Gemma-2-27B also outperforms the traditional ML models in terms of F1-score. However, there are still issues such as performance variation across prompting strategies and the need for domain-specific fine-tuning. This study shows that LLMs can be useful for medical prediction tasks and encourages future work on prompt engineering and hybrid approaches to improve healthcare predictions.

Related papers

Machine Learning for Everyone: Simplifying Healthcare Analytics with BigQuery ML [0.0]
Machine learning (ML) transforms healthcare by enabling predictive analytics, personalized treatments, and improved patient outcomes.<n>Traditional ML often require specialized skills, infrastructure, and resources, limiting accessibility for many healthcare professionals.<n>This paper explores how BigQuery ML Cloud service helps healthcare researchers and data analysts to build and deploy models usingsql, without need for advanced ML knowledge.
arXiv Detail & Related papers (2025-02-10T20:38:53Z)
Large Language Models for Medical Forecasting -- Foresight 2 [0.573038865401108]
Foresight 2 (FS2) is a large language model fine-tuned on hospital data for modelling patient timelines.<n>It can understand patients' clinical notes and predict SNOMED codes for a wide range of biomedical use cases.
arXiv Detail & Related papers (2024-12-14T14:45:28Z)
Predicting Emergent Capabilities by Finetuning [98.9684114851891]
We find that finetuning language models can shift the point in scaling at which emergence occurs towards less capable models. We validate this approach using four standard NLP benchmarks. We find that, in some cases, we can accurately predict whether models trained with up to 4x more compute have emerged.
arXiv Detail & Related papers (2024-11-25T01:48:09Z)
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy. By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z)
Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models [0.06555599394344236]
This study evaluates the medical reasoning performance of large language models (LLMs) and vision language models (VLMs) in gastroenterology. We used 300 gastroenterology board exam-style multiple-choice questions, 138 of which contain images.
arXiv Detail & Related papers (2024-08-25T14:50:47Z)
Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources [13.750202656564907]
Adverse event (AE) extraction is crucial for monitoring and analyzing the safety profiles of immunizations. This study aims to evaluate the effectiveness of large language models (LLMs) and traditional deep learning models in AE extraction.
arXiv Detail & Related papers (2024-06-26T03:56:21Z)
Efficient Continual Pre-training by Mitigating the Stability Gap [68.49269649759005]
We study the behavior of Large Language Models (LLMs) during continual pre-training. We propose three effective strategies to enhance LLM performance within a fixed compute budget. Our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget.
arXiv Detail & Related papers (2024-06-21T02:28:37Z)
Large Language Model Distilling Medication Recommendation Model [58.94186280631342]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)<n>Our research aims to transform existing medication recommendation methodologies using LLMs.<n>To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z)
Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM. For learning methods, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z)
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes [91.58845026796149]
We introduce Distilling step-by-step, a new mechanism that trains small models that outperform large language models. We present three findings across 4 NLP benchmarks.
arXiv Detail & Related papers (2023-05-03T17:50:56Z)
CancerGPT: Few-shot Drug Pair Synergy Prediction using Large Pre-trained Language Models [3.682742580232362]
Large pre-trained language models (LLMs) have been shown to have significant potential in few-shot learning across various fields. Our research is the first to tackle drug pair synergy prediction in rare tissues with limited data.
arXiv Detail & Related papers (2023-04-18T02:49:53Z)
Benchmarking AutoML Frameworks for Disease Prediction Using Medical Claims [7.219529711278771]
We generated a large dataset using historical administrative claims including demographic information and flags for disease codes. We trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics. All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high.
arXiv Detail & Related papers (2021-07-22T07:34:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.