Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data
Generation with Large Language Models
- URL: http://arxiv.org/abs/2311.00287v1
- Date: Wed, 1 Nov 2023 04:37:28 GMT
- Title: Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data
Generation with Large Language Models
- Authors: Ran Xu, Hejie Cui, Yue Yu, Xuan Kan, Wenqi Shi, Yuchen Zhuang, Wei
Jin, Joyce Ho, Carl Yang
- Abstract summary: Clinical natural language processing requires methods that can address domain-specific challenges.
We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process.
Our empirical study across 7 clinical NLP tasks and 16 datasets reveals that ClinGen consistently enhances performance across various tasks.
- Score: 48.07083163501746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clinical natural language processing requires methods that can address
domain-specific challenges, such as complex medical terminology and clinical
contexts. Recently, large language models (LLMs) have shown promise in this
domain. Yet, their direct deployment can lead to privacy issues and are
constrained by resources. To address this challenge, we delve into synthetic
clinical text generation using LLMs for clinical NLP tasks. We propose an
innovative, resource-efficient approach, ClinGen, which infuses knowledge into
the process. Our model involves clinical knowledge extraction and
context-informed LLM prompting. Both clinical topics and writing styles are
drawn from external domain-specific knowledge graphs and LLMs to guide data
generation. Our extensive empirical study across 7 clinical NLP tasks and 16
datasets reveals that ClinGen consistently enhances performance across various
tasks, effectively aligning the distribution of real datasets and significantly
enriching the diversity of generated training instances. We will publish our
code and all the generated data in \url{https://github.com/ritaranx/ClinGen}.
Related papers
- Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - An Introduction to Natural Language Processing Techniques and Framework
for Clinical Implementation in Radiation Oncology [1.2714439146420664]
We present state-of-the-art NLP applications that employ large language models (LLMs) in radiation oncology research.
LLMs are prone to many errors such as hallucinations, biases, and ethical violations, which necessitate rigorous evaluation and validation.
Our article aims to provide guidance and insights for researchers and clinicians who are interested in developing and using NLP models in clinical radiation oncology.
arXiv Detail & Related papers (2023-11-03T19:32:35Z) - An Empirical Evaluation of Prompting Strategies for Large Language
Models in Zero-Shot Clinical Natural Language Processing [4.758617742396169]
We present a comprehensive and systematic experimental study on prompt engineering for five clinical NLP tasks.
We assessed the prompts proposed in recent literature, including simple prefix, simple cloze, chain of thought, and anticipatory prompts.
We provide novel insights and guidelines for prompt engineering for LLMs in clinical NLP.
arXiv Detail & Related papers (2023-09-14T19:35:00Z) - UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for
Biomedical Entity Recognition [4.865221751784403]
This work contributes a data-centric paradigm for enriching the language representations of biomedical transformer-encoder LMs by extracting text sequences from the UMLS.
Preliminary results from experiments in the extension of pre-trained LMs as well as training from scratch show that this framework improves downstream performance on multiple biomedical and clinical Named Entity Recognition (NER) tasks.
arXiv Detail & Related papers (2023-07-20T18:08:34Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - A Multi-View Joint Learning Framework for Embedding Clinical Codes and
Text Using Graph Neural Networks [23.06795121693656]
We propose a framework that learns from codes and text to combine the availability and forward-looking nature of text and better performance of ICD codes.
Our approach uses a Graph Neural Network (GNN) to process ICD codes, and Bi-LSTM to process text.
In experiments using planned surgical procedure text, our model outperforms BERT models fine-tuned to clinical data.
arXiv Detail & Related papers (2023-01-27T09:19:03Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - HealthPrompt: A Zero-shot Learning Paradigm for Clinical Natural
Language Processing [3.762895631262445]
We developed a novel prompt-based clinical NLP framework called HealthPrompt.
We performed an in-depth analysis of HealthPrompt on six different PLMs in a no-data setting.
Our experiments prove that prompts effectively capture the context of clinical texts and perform remarkably well without any training data.
arXiv Detail & Related papers (2022-03-09T21:44:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.