An Empirical Evaluation of Prompting Strategies for Large Language
Models in Zero-Shot Clinical Natural Language Processing
- URL: http://arxiv.org/abs/2309.08008v1
- Date: Thu, 14 Sep 2023 19:35:00 GMT
- Title: An Empirical Evaluation of Prompting Strategies for Large Language
Models in Zero-Shot Clinical Natural Language Processing
- Authors: Sonish Sivarajkumar, Mark Kelley, Alyssa Samolyk-Mazzanti, Shyam
Visweswaran, Yanshan Wang
- Abstract summary: We present a comprehensive and systematic experimental study on prompt engineering for five clinical NLP tasks.
We assessed the prompts proposed in recent literature, including simple prefix, simple cloze, chain of thought, and anticipatory prompts.
We provide novel insights and guidelines for prompt engineering for LLMs in clinical NLP.
- Score: 4.758617742396169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have shown remarkable capabilities in Natural
Language Processing (NLP), especially in domains where labeled data is scarce
or expensive, such as clinical domain. However, to unlock the clinical
knowledge hidden in these LLMs, we need to design effective prompts that can
guide them to perform specific clinical NLP tasks without any task-specific
training data. This is known as in-context learning, which is an art and
science that requires understanding the strengths and weaknesses of different
LLMs and prompt engineering approaches. In this paper, we present a
comprehensive and systematic experimental study on prompt engineering for five
clinical NLP tasks: Clinical Sense Disambiguation, Biomedical Evidence
Extraction, Coreference Resolution, Medication Status Extraction, and
Medication Attribute Extraction. We assessed the prompts proposed in recent
literature, including simple prefix, simple cloze, chain of thought, and
anticipatory prompts, and introduced two new types of prompts, namely heuristic
prompting and ensemble prompting. We evaluated the performance of these prompts
on three state-of-the-art LLMs: GPT-3.5, BARD, and LLAMA2. We also contrasted
zero-shot prompting with few-shot prompting, and provide novel insights and
guidelines for prompt engineering for LLMs in clinical NLP. To the best of our
knowledge, this is one of the first works on the empirical evaluation of
different prompt engineering approaches for clinical NLP in this era of
generative AI, and we hope that it will inspire and inform future research in
this area.
Related papers
- Demystifying Large Language Models for Medicine: A Primer [50.83806796466396]
Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare.
This tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice.
arXiv Detail & Related papers (2024-10-24T15:41:56Z) - Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic.
We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks.
We then construct six novel datasets and clinical tasks that are complex but common in real-world practice.
We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z) - Guiding Clinical Reasoning with Large Language Models via Knowledge Seeds [32.99251005719732]
Clinical reasoning refers to the cognitive process that physicians employ in evaluating and managing patients.
In this study, we introduce a novel framework, In-Context Padding (ICP), designed to enhance LLMs with medical knowledge.
arXiv Detail & Related papers (2024-03-11T10:53:20Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - An Introduction to Natural Language Processing Techniques and Framework
for Clinical Implementation in Radiation Oncology [1.2714439146420664]
We present state-of-the-art NLP applications that employ large language models (LLMs) in radiation oncology research.
LLMs are prone to many errors such as hallucinations, biases, and ethical violations, which necessitate rigorous evaluation and validation.
Our article aims to provide guidance and insights for researchers and clinicians who are interested in developing and using NLP models in clinical radiation oncology.
arXiv Detail & Related papers (2023-11-03T19:32:35Z) - Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data
Generation with Large Language Models [48.07083163501746]
Clinical natural language processing requires methods that can address domain-specific challenges.
We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process.
Our empirical study across 7 clinical NLP tasks and 16 datasets reveals that ClinGen consistently enhances performance across various tasks.
arXiv Detail & Related papers (2023-11-01T04:37:28Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Are Large Language Models Ready for Healthcare? A Comparative Study on
Clinical Language Understanding [12.128991867050487]
Large language models (LLMs) have made significant progress in various domains, including healthcare.
In this study, we evaluate state-of-the-art LLMs within the realm of clinical language understanding tasks.
arXiv Detail & Related papers (2023-04-09T16:31:47Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z) - HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural
Language Processing [3.762895631262445]
We developed a novel prompt-based clinical NLP framework called HealthPrompt.
We performed an in-depth analysis of HealthPrompt on six different PLMs in a no-data setting.
Our experiments prove that prompts effectively capture the context of clinical texts and perform remarkably well without any training data.
arXiv Detail & Related papers (2022-03-09T21:44:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.