Autocompletion of Chief Complaints in the Electronic Health Records
using Large Language Models
- URL: http://arxiv.org/abs/2401.06088v1
- Date: Thu, 11 Jan 2024 18:06:30 GMT
- Title: Autocompletion of Chief Complaints in the Electronic Health Records
using Large Language Models
- Authors: K M Sajjadul Islam, Ayesha Siddika Nipu, Praveen Madiraju, Priya
Deshpande
- Abstract summary: We utilize text generation techniques to develop machine learning models using Chief Complaint (CC) data.
We tune a prompt by incorporating CC sentences, utilizing the OpenAI API of GPT-4.
We evaluate the models' performance based on the perplexity score, modified BERTScore, and cosine similarity score.
- Score: 0.3749861135832072
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The Chief Complaint (CC) is a crucial component of a patient's medical record
as it describes the main reason or concern for seeking medical care. It
provides critical information for healthcare providers to make informed
decisions about patient care. However, documenting CCs can be time-consuming
for healthcare providers, especially in busy emergency departments. To address
this issue, an autocompletion tool that suggests accurate and well-formatted
phrases or sentences for clinical notes can be a valuable resource for triage
nurses. In this study, we utilized text generation techniques to develop
machine learning models using CC data. In our proposed work, we train a Long
Short-Term Memory (LSTM) model and fine-tune three different variants of
Biomedical Generative Pretrained Transformers (BioGPT), namely
microsoft/biogpt, microsoft/BioGPT-Large, and microsoft/BioGPT-Large-PubMedQA.
Additionally, we tune a prompt by incorporating exemplar CC sentences,
utilizing the OpenAI API of GPT-4. We evaluate the models' performance based on
the perplexity score, modified BERTScore, and cosine similarity score. The
results show that BioGPT-Large exhibits superior performance compared to the
other models. It consistently achieves a remarkably low perplexity score of
1.65 when generating CC, whereas the baseline LSTM model achieves the best
perplexity score of 170. Further, we evaluate and assess the proposed models'
performance and the outcome of GPT-4.0. Our study demonstrates that utilizing
LLMs such as BioGPT, leads to the development of an effective autocompletion
tool for generating CC documentation in healthcare settings.
Related papers
- Improving Clinical Documentation with AI: A Comparative Study of Sporo AI Scribe and GPT-4o mini [0.0]
Sporo Health's AI scribe was evaluated against OpenAI's GPT-4o Mini.
Results show that Sporo AI consistently outperformed GPT-4o Mini, achieving higher recall, precision, and overall F1 scores.
arXiv Detail & Related papers (2024-10-20T22:48:40Z) - Is larger always better? Evaluating and prompting large language models for non-generative medical tasks [11.799956298563844]
This study benchmarks various models, including GPT-based LLMs, BERT-based models, and traditional clinical predictive models.
We focused on tasks such as readmission and prediction, disease hierarchy reconstruction, and biomedical sentence matching.
Results indicated that LLMs exhibited robust zero-shot predictive capabilities on structured EHR data when using well-designed prompting strategies.
For unstructured medical texts, LLMs did not outperform finetuned BERT models, which excelled in both supervised and unsupervised tasks.
arXiv Detail & Related papers (2024-07-26T06:09:10Z) - STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Question-Answering [58.79671189792399]
STLLaVA-Med is designed to train a policy model capable of auto-generating medical visual instruction data.
We validate the efficacy and data efficiency of STLLaVA-Med across three major medical Visual Question Answering (VQA) benchmarks.
arXiv Detail & Related papers (2024-06-28T15:01:23Z) - Optimal path for Biomedical Text Summarization Using Pointer GPT [21.919661430250798]
GPT models have a tendency to generate factual errors, lack context, and oversimplify words.
To address these limitations, we replaced the attention mechanism in the GPT model with a pointer network.
The effectiveness of the Pointer-GPT model was evaluated using the ROUGE score.
arXiv Detail & Related papers (2024-03-22T02:13:23Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Investigating Large Language Models and Control Mechanisms to Improve Text Readability of Biomedical Abstracts [16.05119302860606]
We investigate the ability of state-of-the-art large language models (LLMs) on the task of biomedical abstract simplification.
The methods applied include domain fine-tuning and prompt-based learning (PBL)
We used a range of automatic evaluation metrics, including BLEU, ROUGE, SARI, and BERTscore, and also conducted human evaluations.
arXiv Detail & Related papers (2023-09-22T22:47:32Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - Evaluation of ChatGPT Family of Models for Biomedical Reasoning and
Classification [6.163540203358258]
This study investigates the performance of large language models (LLMs) in biomedical tasks beyond question-answering.
Because no patient data can be passed to the OpenAI API public interface, we evaluated model performance with over 10000 samples.
We found that fine-tuning for two fundamental NLP tasks remained the best strategy.
arXiv Detail & Related papers (2023-04-05T15:11:25Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.