Related papers: Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine

Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine

URL: http://arxiv.org/abs/2210.12777v4
Date: Sun, 21 Jul 2024 22:57:58 GMT
Title: Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine
Authors: Fenglin Liu, Bang Yang, Chenyu You, Xian Wu, Shen Ge, Zhangdaihong Liu, Xu Sun, Yang Yang, David A. Clifton,
Abstract summary: We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning. We demonstrate the effectiveness of our method in generating patient discharge instructions.
Score: 68.7814360102644
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Language models (LMs), including large language models (such as ChatGPT), have the potential to assist clinicians in generating various clinical notes. However, LMs are prone to produce ``hallucinations'', i.e., generated content that is not aligned with facts and knowledge. In this paper, we propose the Re$^3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning to enable LMs to generate faithful clinical texts. We demonstrate the effectiveness of our method in generating patient discharge instructions. It requires the LMs not to only understand the patients' long clinical documents, i.e., the health records during hospitalization, but also to generate critical instructional information provided both to carers and to the patient at the time of discharge. The proposed Re$^3$Writer imitates the working patterns of physicians to first \textbf{re}trieve related working experience from historical instructions written by physicians, then \textbf{re}ason related medical knowledge. Finally, it \textbf{re}fines the retrieved working experience and reasoned medical knowledge to extract useful information, which is used to generate the discharge instructions for previously-unseen patients. Our experiments show that, using our method, the performance of five representative LMs can be substantially boosted across all metrics. Meanwhile, we show results from human evaluations to measure the effectiveness in terms of fluency, faithfulness, and comprehensiveness.

Related papers

Abstract Meaning Representation for Hospital Discharge Summarization [0.8813014553043816]
This work is to discover new methods that combine language-based graphs and deep learning models to address provenance of content and trustworthiness in automatic summarization.<n>Our method shows impressive reliability results on the publicly available Medical Information Mart for Intensive III (MIMIC-III) corpus and clinical notes written by physicians at Anonymous Hospital.
arXiv Detail & Related papers (2025-06-17T01:33:01Z)
Medical Hallucinations in Foundation Models and Their Impact on Healthcare [53.97060824532454]
Foundation Models that are capable of processing and generating multi-modal data have transformed AI's role in medicine. We define medical hallucination as any instance in which a model generates misleading medical content. Our results reveal that inference techniques such as Chain-of-Thought (CoT) and Search Augmented Generation can effectively reduce hallucination rates. These findings underscore the ethical and practical imperative for robust detection and mitigation strategies.
arXiv Detail & Related papers (2025-02-26T02:30:44Z)
Improving Clinical Note Generation from Complex Doctor-Patient Conversation [20.2157016701399]
We present three key contributions to the field of clinical note generation using large language models (LLMs) First, we introduce CliniKnote, a dataset consisting of 1,200 complex doctor-patient conversations paired with their full clinical notes. Second, we propose K-SOAP, which enhances traditional SOAPcitepodder20soap (Subjective, Objective, Assessment, and Plan) notes by adding a keyword section at the top, allowing for quick identification of essential information. Third, we develop an automatic pipeline to generate K-SOAP notes from doctor-patient conversations and benchmark various modern LLMs using various
arXiv Detail & Related papers (2024-08-26T18:39:31Z)
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions. VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information. We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z)
LLMs Accelerate Annotation for Medical Information Extraction [7.743388571513413]
We propose an approach that combines Large Language Models (LLMs) with human expertise to create an efficient method for generating ground truth labels for medical text annotation. We rigorously evaluate our method on a medical information extraction task, demonstrating that our approach not only substantially cuts down on human intervention but also maintains high accuracy.
arXiv Detail & Related papers (2023-12-04T19:26:13Z)
Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization [8.456700096020601]
Large language models (LLMs) have shown promise in natural language processing (NLP), but their effectiveness on a diverse range of clinical summarization tasks remains unproven. In this study, we apply adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks. A clinical reader study with ten physicians evaluates summary, completeness, correctness, and conciseness; in a majority of cases, summaries from our best adapted LLMs are either equivalent (45%) or superior (36%) compared to summaries from medical experts.
arXiv Detail & Related papers (2023-09-14T05:15:01Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
SPeC: A Soft Prompt-Based Calibration on Performance Variability of Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization. Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z)
Towards more patient friendly clinical notes through language models and ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling. We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians. Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z)
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.