Related papers: A Comprehensive Study on Fine-Tuning Large Language Models for Medical Question Answering Using Classification Models and Comparative Analysis

A Comprehensive Study on Fine-Tuning Large Language Models for Medical Question Answering Using Classification Models and Comparative Analysis

URL: http://arxiv.org/abs/2501.17190v1
Date: Mon, 27 Jan 2025 03:31:02 GMT
Title: A Comprehensive Study on Fine-Tuning Large Language Models for Medical Question Answering Using Classification Models and Comparative Analysis
Authors: Aysegul Ucar, Soumik Nayak, Anunak Roy, Burak Taşcı, Gülay Taşcı,
Abstract summary: We are improving the accuracy and efficiency of providing reliable answers to medical questions.<n>Various models such as RoBERTa and BERT were examined and evaluated based on their ability.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents the overview of the development and fine-tuning of large language models (LLMs) designed specifically for answering medical questions. We are mainly improving the accuracy and efficiency of providing reliable answers to medical queries. In our approach, we have two stages, prediction of a specific label for the received medical question and then providing a predefined answer for this label. Various models such as RoBERTa and BERT were examined and evaluated based on their ability. The models are trained using the datasets derived from 6,800 samples that were scraped from Healthline. com with additional synthetic data. For evaluation, we conducted a comparative study using 5-fold cross-validation. For accessing performance we used metrics like, accuracy, precision, recall, and F1 score and also recorded the training time. The performance of the models was evaluated using 5-fold cross-validation. The LoRA Roberta-large model achieved an accuracy of 78.47%, precision of 72.91%, recall of 76.95%, and an F1 score of 73.56%. The Roberta-base model demonstrated high performance with an accuracy of 99.87%, precision of 99.81%, recall of 99.86%, and an F1 score of 99.82%. The Bert Uncased model showed strong results with an accuracy of 95.85%, precision of 94.42%, recall of 95.58%, and an F1 score of 94.72%. Lastly, the Bert Large Uncased model achieved the highest performance, with an accuracy, precision, recall, and F1 score of 100%. The results obtained have helped indicate the capability of the models in classifying the medical questions and generating accurate answers in the prescription of improved health-related AI solutions.

Related papers

Enhancing Clinical Text Classification via Fine-Tuned DRAGON Longformer Models [7.514574388197471]
This study explores the optimization of the DRAGON Longformer base model for clinical text classification.<n>A dataset of 500 clinical cases containing structured medical observations was used.<n>The optimized model achieved notable performance gains.
arXiv Detail & Related papers (2025-07-13T03:10:19Z)
Advanced Health Misinformation Detection Through Hybrid CNN-LSTM Models Informed by the Elaboration Likelihood Model (ELM) [0.43695508295565777]
This study applies the Elaboration Likelihood Model (ELM) to enhance misinformation detection on social media.<n>The model aims to enhance the detection accuracy and reliability of misinformation classification by integrating ELM-based features.<n>The enhanced model achieved an accuracy of 97.37%, precision of 96.88%, recall of 98.50%, F1-score of 97.41%, and ROC-AUC of 99.50%.
arXiv Detail & Related papers (2025-07-12T05:44:06Z)
Diabetic Retinopathy Detection Based on Convolutional Neural Networks with SMOTE and CLAHE Techniques Applied to Fundus Images [0.0]
Diabetic retinopathy (DR) is one of the major complications in diabetic patients' eyes. This study aims to evaluate the accuracy of artificial intelligence (AI) in diagnosing DR.
arXiv Detail & Related papers (2025-04-08T05:38:53Z)
Self-Supervised Radiograph Anatomical Region Classification -- How Clean Is Your Real-World Data? [10.5757425746568]
We show the effectiveness of self-supervised methods in assigning one of 14 anatomical region classes in our in-house dataset of 48,434 skeletal radiographs.<n>We achieve a strong linear evaluation accuracy of 96.6% with a single model and 97.7% using an ensemble approach.
arXiv Detail & Related papers (2024-12-20T15:07:55Z)
Brain Tumor Classification on MRI in Light of Molecular Markers [61.77272414423481]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas. This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z)
Stacking-Enhanced Bagging Ensemble Learning for Breast Cancer Classification with CNN [0.24578723416255752]
This paper proposes a CNN classification network based on Bagging and stacking ensemble learning methods for breast cancer classification. The model is capable of fast and accurate classification of input images. For binary classification (presence or absence of breast cancer), the accuracy reached 98.84%, and for five-class classification, the accuracy reached 98.34%.
arXiv Detail & Related papers (2024-07-15T09:44:43Z)
Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z)
Common 7B Language Models Already Possess Strong Math Capabilities [61.61442513067561]
This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities. The potential for extensive scaling is constrained by the scarcity of publicly available math questions.
arXiv Detail & Related papers (2024-03-07T18:00:40Z)
A Federated Learning Framework for Stenosis Detection [70.27581181445329]
This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA) Two heterogeneous datasets from two institutions were considered: dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy) dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature.
arXiv Detail & Related papers (2023-10-30T11:13:40Z)
Large Language Models to Identify Social Determinants of Health in Electronic Health Records [2.168737004368243]
Social determinants of health (SDoH) have an important impact on patient outcomes but are incompletely collected from the electronic health records (EHRs) This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented. 800 patient notes were annotated for SDoH categories, and several transformer-based models were evaluated.
arXiv Detail & Related papers (2023-08-11T19:18:35Z)
Comparative Analysis of Epileptic Seizure Prediction: Exploring Diverse Pre-Processing Techniques and Machine Learning Models [0.0]
We present a comparative analysis of five machine learning models for the prediction of epileptic seizures using EEG data. The results of our analysis demonstrate the performance of each model in terms of accuracy. The ET model exhibited the best performance with an accuracy of 99.29%.
arXiv Detail & Related papers (2023-08-06T08:50:08Z)
On the explainability of hospitalization prediction on a large COVID-19 patient dataset [45.82374977939355]
We develop various AI models to predict hospitalization on a large (over 110$k$) cohort of COVID-19 positive-tested US patients. Despite high data unbalance, the models reach average precision 0.96-0.98 (0.75-0.85), recall 0.96-0.98 (0.74-0.85), and $F_score 0.97-0.98 (0.79-0.83) on the non-hospitalized (or hospitalized) class.
arXiv Detail & Related papers (2021-10-28T10:23:38Z)
Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles [38.23896575179384]
We propose a principled and practically effective framework that simultaneously addresses the two tasks. One instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%. On iWildCam, one instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%.
arXiv Detail & Related papers (2021-06-29T21:32:51Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.