Related papers: Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding

Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding

URL: http://arxiv.org/abs/2304.05368v3
Date: Sun, 30 Jul 2023 19:09:02 GMT
Title: Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding
Authors: Yuqing Wang, Yun Zhao, Linda Petzold
Abstract summary: Large language models (LLMs) have made significant progress in various domains, including healthcare. In this study, we evaluate state-of-the-art LLMs within the realm of clinical language understanding tasks.
Score: 12.128991867050487
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models (LLMs) have made significant progress in various domains, including healthcare. However, the specialized nature of clinical language understanding tasks presents unique challenges and limitations that warrant further investigation. In this study, we conduct a comprehensive evaluation of state-of-the-art LLMs, namely GPT-3.5, GPT-4, and Bard, within the realm of clinical language understanding tasks. These tasks span a diverse range, including named entity recognition, relation extraction, natural language inference, semantic textual similarity, document classification, and question-answering. We also introduce a novel prompting strategy, self-questioning prompting (SQP), tailored to enhance LLMs' performance by eliciting informative questions and answers pertinent to the clinical scenarios at hand. Our evaluation underscores the significance of task-specific learning strategies and prompting techniques for improving LLMs' effectiveness in healthcare-related tasks. Additionally, our in-depth error analysis on the challenging relation extraction task offers valuable insights into error distribution and potential avenues for improvement using SQP. Our study sheds light on the practical implications of employing LLMs in the specialized domain of healthcare, serving as a foundation for future research and the development of potential applications in healthcare settings.

Related papers

Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z)
Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions. We propose a novel approach utilizing structured medical reasoning. Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment. We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews. Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z)
Demystifying Large Language Models for Medicine: A Primer [50.83806796466396]
Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare. This tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice.
arXiv Detail & Related papers (2024-10-24T15:41:56Z)
Diagnostic Reasoning in Natural Language: Computational Model and Application [68.47402386668846]
We investigate diagnostic abductive reasoning (DAR) in the context of language-grounded tasks (NL-DAR) We propose a novel modeling framework for NL-DAR based on Pearl's structural causal models. We use the resulting dataset to investigate the human decision-making process in NL-DAR.
arXiv Detail & Related papers (2024-09-09T06:55:37Z)
Large Language Models Struggle in Token-Level Clinical Named Entity Recognition [12.972058180227194]
Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Named Entity Recognition (NER) stands out as an essential task and it plays a crucial role in extracting relevant information from clinical texts. This study aims to bridge this gap by investigating the effectiveness of both proprietary and local LLMs in token-level clinical NER.
arXiv Detail & Related papers (2024-06-30T15:38:48Z)
Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. We then construct six novel datasets and clinical tasks that are complex but common in real-world practice. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z)
Large language models in healthcare and medical domain: A review [4.456243157307507]
Large language models (LLMs) provide proficient responses to free-text queries. This review explores the potential of LLMs to amplify the efficiency and effectiveness of diverse healthcare applications.
arXiv Detail & Related papers (2023-12-12T20:54:51Z)
Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review [16.008511195589925]
Large language models (LLMs) have shown promising capabilities in mimicking human-level language comprehension and reasoning. This paper provides a comprehensive review on the applications and implications of LLMs in medicine.
arXiv Detail & Related papers (2023-11-03T13:51:36Z)
An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing [4.758617742396169]
We present a comprehensive and systematic experimental study on prompt engineering for five clinical NLP tasks. We assessed the prompts proposed in recent literature, including simple prefix, simple cloze, chain of thought, and anticipatory prompts. We provide novel insights and guidelines for prompt engineering for LLMs in clinical NLP.
arXiv Detail & Related papers (2023-09-14T19:35:00Z)
Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization [8.456700096020601]
Large language models (LLMs) have shown promise in natural language processing (NLP), but their effectiveness on a diverse range of clinical summarization tasks remains unproven. In this study, we apply adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks. A clinical reader study with ten physicians evaluates summary, completeness, correctness, and conciseness; in a majority of cases, summaries from our best adapted LLMs are either equivalent (45%) or superior (36%) compared to summaries from medical experts.
arXiv Detail & Related papers (2023-09-14T05:15:01Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Large Language Models Leverage External Knowledge to Extend Clinical Insight Beyond Language Boundaries [48.48630043740588]
Large Language Models (LLMs) such as ChatGPT and Med-PaLM have excelled in various medical question-answering tasks. We develop a novel in-context learning framework to enhance their performance.
arXiv Detail & Related papers (2023-05-17T12:31:26Z)
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.