Related papers: Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study

Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study

URL: http://arxiv.org/abs/2405.00728v1
Date: Sat, 27 Apr 2024 04:12:02 GMT
Title: Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study
Authors: Dou Liu, Ying Han, Xiandi Wang, Xiaomei Tan, Di Liu, Guangwu Qian, Kang Li, Dan Pu, Rong Yin,
Abstract summary: The integration of Artificial Intelligence in healthcare presents a transformative potential for enhancing operational efficiency and health outcomes. Large Language Models (LLMs), such as ChatGPT, have shown their capabilities in supporting medical decision-making. This study specifically aims to evaluate the consistency of responses provided by ChatGPT in outpatient guidance.
Score: 11.37622565068147
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The integration of Artificial Intelligence (AI) in healthcare presents a transformative potential for enhancing operational efficiency and health outcomes. Large Language Models (LLMs), such as ChatGPT, have shown their capabilities in supporting medical decision-making. Embedding LLMs in medical systems is becoming a promising trend in healthcare development. The potential of ChatGPT to address the triage problem in emergency departments has been examined, while few studies have explored its application in outpatient departments. With a focus on streamlining workflows and enhancing efficiency for outpatient triage, this study specifically aims to evaluate the consistency of responses provided by ChatGPT in outpatient guidance, including both within-version response analysis and between-version comparisons. For within-version, the results indicate that the internal response consistency for ChatGPT-4.0 is significantly higher than ChatGPT-3.5 (p=0.03) and both have a moderate consistency (71.2% for 4.0 and 59.6% for 3.5) in their top recommendation. However, the between-version consistency is relatively low (mean consistency score=1.43/3, median=1), indicating few recommendations match between the two versions. Also, only 50% top recommendations match perfectly in the comparisons. Interestingly, ChatGPT-3.5 responses are more likely to be complete than those from ChatGPT-4.0 (p=0.02), suggesting possible differences in information processing and response generation between the two versions. The findings offer insights into AI-assisted outpatient operations, while also facilitating the exploration of potentials and limitations of LLMs in healthcare utilization. Future research may focus on carefully optimizing LLMs and AI integration in healthcare systems based on ergonomic and human factors principles, precisely aligning with the specific needs of effective outpatient triage.

Related papers

Development and Comparative Evaluation of Three Artificial Intelligence Models (NLP, LLM, JEPA) for Predicting Triage in Emergency Departments: A 7-Month Retrospective Proof-of-Concept [0.0]
Triage errors, including undertriage and overtriage, are persistent challenges in emergency departments (EDs)<n>This study compares the performance of three AI models in predicting triage outcomes against the FRENCH scale and clinical practice.
arXiv Detail & Related papers (2025-07-01T16:37:55Z)
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references. We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey. Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z)
Conversation AI Dialog for Medicare powered by Finetuning and Retrieval Augmented Generation [0.0]
Large language models (LLMs) have shown impressive capabilities in natural language processing tasks, including dialogue generation. This research aims to conduct a novel comparative analysis of two prominent techniques, fine-tuning with LoRA and the Retrieval-Augmented Generation framework.
arXiv Detail & Related papers (2025-02-04T11:50:40Z)
Performance of a large language model-Artificial Intelligence based chatbot for counseling patients with sexually transmitted infections and genital diseases [4.910821423749911]
Otiz is an AI-based platform designed specifically for STI detection and counseling. Four STIs (anogenital warts, herpes, syphilis, urethritis/cervicitis) were evaluated using prompts mimicking patient language. Otiz scored highly on diagnostic accuracy (4.14.7), overall accuracy (4.34.6), correctness of information (5.0), comprehensibility (4.2-4.4), and empathy (4.5-4.3.6)
arXiv Detail & Related papers (2024-12-11T20:36:32Z)
Evaluating the Impact of a Specialized LLM on Physician Experience in Clinical Decision Support: A Comparison of Ask Avo and ChatGPT-4 [0.3999851878220878]
Large language models (LLMs) to augment clinical decision support systems is a topic with growing interest. Current shortcomings such as hallucinations and lack of clear source citations make them unreliable for use in rapidly growing clinical environment. This study evaluates Ask Avo-derived software by AvoMD that incorporates a proprietary Model Augmented Language Retrieval system.
arXiv Detail & Related papers (2024-09-06T17:53:29Z)
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals. GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date. It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z)
Reshaping Free-Text Radiology Notes Into Structured Reports With Generative Transformers [0.29530625605275984]
structured reporting (SR) has been recommended by various medical societies. We propose a pipeline to extract information from free-text reports. Our work aims to leverage the potential of Natural Language Processing (NLP) and Transformer-based models.
arXiv Detail & Related papers (2024-03-27T18:38:39Z)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z)
Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning [28.355000184014084]
This study assesses the ability of state-of-the-art large language models (LLMs) to identify patients with mild cognitive impairment (MCI) from discharge summaries. The data was partitioned into training, validation, and testing sets in a 7:2:1 ratio for model fine-tuning and evaluation. Open-source models like Falcon and LLaMA 2 achieved high accuracy but lacked explanatory reasoning.
arXiv Detail & Related papers (2023-12-19T17:36:48Z)
Evaluation of ChatGPT-Generated Medical Responses: A Systematic Review and Meta-Analysis [7.587141771901865]
Large language models such as ChatGPT are increasingly explored in medical domains. This study aims to summarize the available evidence on evaluating ChatGPT's performance in medicine.
arXiv Detail & Related papers (2023-10-12T15:26:26Z)
Automatically measuring speech fluency in people with aphasia: first achievements using read-speech data [55.84746218227712]
This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency.
arXiv Detail & Related papers (2023-08-09T07:51:40Z)
Comparative Analysis of Drug-GPT and ChatGPT LLMs for Healthcare Insights: Evaluating Accuracy and Relevance in Patient and HCP Contexts [0.0]
This study presents a comparative analysis of three Generative Pre-trained Transformer (GPT) solutions in a question and answer (Q&A) setting. The objective is to determine which model delivers the most accurate and relevant information in response to prompts related to patient experiences with atopic dermatitis (AD) and healthcare professional (HCP) discussions about diabetes.
arXiv Detail & Related papers (2023-07-24T19:27:11Z)
Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM) Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z)
Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining. We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data. Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z)
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective [67.98821225810204]
We evaluate the robustness of ChatGPT from the adversarial and out-of-distribution perspective. Results show consistent advantages on most adversarial and OOD classification and translation tasks. ChatGPT shows astounding performance in understanding dialogue-related texts.
arXiv Detail & Related papers (2023-02-22T11:01:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.