Impact of Large Language Model Assistance on Patients Reading Clinical
Notes: A Mixed-Methods Study
- URL: http://arxiv.org/abs/2401.09637v1
- Date: Wed, 17 Jan 2024 23:14:52 GMT
- Title: Impact of Large Language Model Assistance on Patients Reading Clinical
Notes: A Mixed-Methods Study
- Authors: Niklas Mannhardt, Elizabeth Bondi-Kelly, Barbara Lam, Chloe O'Connell,
Mercy Asiedu, Hussein Mozannar, Monica Agrawal, Alejandro Buendia, Tatiana
Urman, Irbaz B. Riaz, Catherine E. Ricciardi, Marzyeh Ghassemi, David Sontag
- Abstract summary: Complex medical concepts and jargon within clinical notes hinder patient comprehension and may lead to anxiety.
We developed a patient-facing tool to simplify, extract information from, and add context to notes.
Augmentations were evaluated for errors by clinicians, and we found misleading errors occur.
- Score: 47.61555826813361
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Patients derive numerous benefits from reading their clinical notes,
including an increased sense of control over their health and improved
understanding of their care plan. However, complex medical concepts and jargon
within clinical notes hinder patient comprehension and may lead to anxiety. We
developed a patient-facing tool to make clinical notes more readable,
leveraging large language models (LLMs) to simplify, extract information from,
and add context to notes. We prompt engineered GPT-4 to perform these
augmentation tasks on real clinical notes donated by breast cancer survivors
and synthetic notes generated by a clinician, a total of 12 notes with 3868
words. In June 2023, 200 female-identifying US-based participants were randomly
assigned three clinical notes with varying levels of augmentations using our
tool. Participants answered questions about each note, evaluating their
understanding of follow-up actions and self-reported confidence. We found that
augmentations were associated with a significant increase in action
understanding score (0.63 $\pm$ 0.04 for select augmentations, compared to 0.54
$\pm$ 0.02 for the control) with p=0.002. In-depth interviews of
self-identifying breast cancer patients (N=7) were also conducted via video
conferencing. Augmentations, especially definitions, elicited positive
responses among the seven participants, with some concerns about relying on
LLMs. Augmentations were evaluated for errors by clinicians, and we found
misleading errors occur, with errors more common in real donated notes than
synthetic notes, illustrating the importance of carefully written clinical
notes. Augmentations improve some but not all readability metrics. This work
demonstrates the potential of LLMs to improve patients' experience with
clinical notes at a lower burden to clinicians. However, having a human in the
loop is important to correct potential model errors.
Related papers
- Adapting Open-Source Large Language Models for Cost-Effective, Expert-Level Clinical Note Generation with On-Policy Reinforcement Learning [19.08691249610632]
This study presents a comprehensive domain- and task-specific adaptation process for the open-source LLaMA-2 13 billion parameter model.
We introduce a new approach, DistillDirect, for performing on-policy reinforcement learning with Gemini 1.0 Pro as the teacher model.
Our model, LLaMA-Clinic, can generate clinical notes comparable in quality to those authored by physicians.
arXiv Detail & Related papers (2024-04-25T15:34:53Z) - Dynamic Q&A of Clinical Documents with Large Language Models [3.021316686584699]
This work introduces a natural language interface using large language models (LLMs) for dynamic question-answering on clinical notes.
Experiments, utilizing various embedding models and advanced LLMs, show Wizard Vicuna's superior accuracy, albeit with high compute demands.
arXiv Detail & Related papers (2024-01-19T14:50:22Z) - Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization [8.456700096020601]
Large language models (LLMs) have shown promise in natural language processing (NLP), but their effectiveness on a diverse range of clinical summarization tasks remains unproven.
In this study, we apply adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks.
A clinical reader study with ten physicians evaluates summary, completeness, correctness, and conciseness; in a majority of cases, summaries from our best adapted LLMs are either equivalent (45%) or superior (36%) compared to summaries from medical experts.
arXiv Detail & Related papers (2023-09-14T05:15:01Z) - MedAlign: A Clinician-Generated Dataset for Instruction Following with
Electronic Medical Records [60.35217378132709]
Large language models (LLMs) can follow natural language instructions with human-level fluency.
evaluating LLMs on realistic text generation tasks for healthcare remains challenging.
We introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data.
arXiv Detail & Related papers (2023-08-27T12:24:39Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z) - Human Evaluation and Correlation with Automatic Metrics in Consultation
Note Generation [56.25869366777579]
In recent years, machine learning models have rapidly become better at generating clinical consultation notes.
We present an extensive human evaluation study where 5 clinicians listen to 57 mock consultations, write their own notes, post-edit a number of automatically generated notes, and extract all the errors.
We find that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore.
arXiv Detail & Related papers (2022-04-01T14:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.