Related papers: Generating Faithful and Complete Hospital-Course Summaries from the Electronic Health Record

Generating Faithful and Complete Hospital-Course Summaries from the Electronic Health Record

URL: http://arxiv.org/abs/2404.01189v1
Date: Mon, 1 Apr 2024 15:47:21 GMT
Title: Generating Faithful and Complete Hospital-Course Summaries from the Electronic Health Record
Authors: Griffin Adams,
Abstract summary: An unintended consequence of the increased documentation burden has been reduced face-time with patients. We propose and evaluate automated solutions for generating a summary of a patient's hospital admissions.
Score: 3.6513957125331555
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid adoption of Electronic Health Records (EHRs) has been instrumental in streamlining administrative tasks, increasing transparency, and enabling continuity of care across providers. An unintended consequence of the increased documentation burden, however, has been reduced face-time with patients and, concomitantly, a dramatic rise in clinician burnout. In this thesis, we pinpoint a particularly time-intensive, yet critical, documentation task: generating a summary of a patient's hospital admissions, and propose and evaluate automated solutions. In Chapter 2, we construct a dataset based on 109,000 hospitalizations (2M source notes) and perform exploratory analyses to motivate future work on modeling and evaluation [NAACL 2021]. In Chapter 3, we address faithfulness from a modeling perspective by revising noisy references [EMNLP 2022] and, to reduce the reliance on references, directly calibrating model outputs to metrics [ACL 2023]. These works relied heavily on automatic metrics as human annotations were limited. To fill this gap, in Chapter 4, we conduct a fine-grained expert annotation of system errors in order to meta-evaluate existing metrics and better understand task-specific issues of domain adaptation and source-summary alignments. To learn a metric less correlated to extractiveness (copy-and-paste), we derive noisy faithfulness labels from an ensemble of existing metrics and train a faithfulness classifier on these pseudo labels [MLHC 2023]. Finally, in Chapter 5, we demonstrate that fine-tuned LLMs (Mistral and Zephyr) are highly prone to entity hallucinations and cover fewer salient entities. We improve both coverage and faithfulness by performing sentence-level entity planning based on a set of pre-computed salient entities from the source text, which extends our work on entity-guided news summarization [ACL, 2023], [EMNLP, 2023].

Related papers

AgenticSum: An Agentic Inference-Time Framework for Faithful Clinical Text Summarization [6.99563009617414]
We present AgenticSum, an inference-time framework that separates context selection, generation, verification, and targeted correction to reduce hallucinated content.<n>We evaluate AgenticSum on two public datasets, using reference-based metrics, LLM-as-a-judge assessment, and human evaluation.<n>Our results indicate that structured, agentic design with targeted correction offers an effective inference time solution to improve clinical note summarization.
arXiv Detail & Related papers (2026-02-23T16:49:37Z)
Toward Reliable Clinical Coding with Language Models: Verification and Lightweight Adaptation [3.952186976672079]
We show that lightweight interventions, including prompt engineering and small-scale fine-tuning, can improve accuracy without the computational overhead of search-based methods.<n>To address hierarchically near-miss errors, we introduce clinical code verification as both a standalone task and a pipeline component.
arXiv Detail & Related papers (2025-10-08T23:50:58Z)
MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graph [57.54231831309079]
We introduce MedKGent, a framework for constructing temporally evolving medical Knowledge Graphs.<n>We simulate the emergence of biomedical knowledge via a fine-grained daily time series.<n>The resulting KG contains 156,275 entities and 2,971,384 relational triples.
arXiv Detail & Related papers (2025-08-17T15:14:03Z)
DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits [0.1578515540930834]
Progress notes are among the most clinically meaningful artifacts in an Electronic Health Record.<n>Despite their importance, they are underrepresented in large-scale EHR datasets.<n>We present DENSE, a system designed to align with clinical documentation by simulating how physicians reference past encounters.
arXiv Detail & Related papers (2025-07-18T17:00:27Z)
Extracting Patient History from Clinical Text: A Comparative Study of Clinical Large Language Models [3.1277841304339065]
This study evaluates the performance of clinical large language models (cLLMs) in recognizing medical history entities (MHEs) We annotated 1,449 MHEs across 61 outpatient-related clinical notes from the MTSamples repository. The cLLMs showed potential in reducing the time required for extracting MHEs by over 20%.
arXiv Detail & Related papers (2025-03-30T02:00:56Z)
LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment. We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews. Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z)
Optimizing Automatic Summarization of Long Clinical Records Using Dynamic Context Extension:Testing and Evaluation of the NBCE Method [1.2779169621283721]
Current manual summarization makes medical staff struggle. We propose an automatic method using LLMs, but long inputs cause LLMs to lose context. We used a 7B model, open-calm-7b, enhanced with Native Bayes Context Extend. Our improved model achieved near parity with Google's over 175B Gemini on ROUGE-L metrics with 200 samples.
arXiv Detail & Related papers (2024-11-13T13:09:14Z)
Affinity-Graph-Guided Contractive Learning for Pretext-Free Medical Image Segmentation with Minimal Annotation [55.325956390997]
This paper proposes an affinity-graph-guided semi-supervised contrastive learning framework (Semi-AGCL) for medical image segmentation. The framework first designs an average-patch-entropy-driven inter-patch sampling method, which can provide a robust initial feature space. With merely 10% of the complete annotation set, our model approaches the accuracy of the fully annotated baseline, manifesting a marginal deviation of only 2.52%.
arXiv Detail & Related papers (2024-10-14T10:44:47Z)
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs. Existing benchmarks are often limited in scope, focusing mainly on object hallucinations. We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z)
FABLES: Evaluating faithfulness and content selection in book-length summarization [55.50680057160788]
In this paper, we conduct the first large-scale human evaluation of faithfulness and content selection on book-length documents. We collect FABLES, a dataset of annotations on 3,158 claims made in LLM-generated summaries of 26 books, at a cost of $5.2K USD. An analysis of the annotations reveals that most unfaithful claims relate to events and character states, and they generally require indirect reasoning over the narrative to invalidate.
arXiv Detail & Related papers (2024-04-01T17:33:38Z)
SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval [9.654951710218876]
Clinician must write a lengthy summary each time a patient is discharged from the hospital. Identifying and covering salient entities is vital for the summary to be clinically useful. We fine-tune open-source LLMs on the task and find that they generate incomplete and unfaithful summaries.
arXiv Detail & Related papers (2024-01-04T17:23:44Z)
CUED at ProbSum 2023: Hierarchical Ensemble of Summarization Models [8.237131071390715]
We consider the challenge of summarizing patients' medical progress notes in a limited data setting. For the Problem List Summarization (shared task 1A) at the BioNLP Workshop 2023, we demonstrate that Clinical-T5 fine-tuned to 765 medical clinic notes outperforms other extractive, abstractive and zero-shot baselines.
arXiv Detail & Related papers (2023-06-08T16:08:10Z)
PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models [25.363775123262307]
Automatic summarisation of a patient's problems in the form of a problem list can aid stakeholders in understanding a patient's condition, reducing workload and cognitive bias. BioNLP 2023 Shared Task 1A focuses on generating a list of diagnoses and problems from the provider's progress notes during hospitalisation. One component employs large language models (LLMs) for data augmentation; the other is an abstractive summarisation LLM with a novel pre-training objective for generating the patients' problems summarised as a list. Our approach was ranked second among all submissions to the shared task.
arXiv Detail & Related papers (2023-06-05T10:17:50Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
A Meta-Evaluation of Faithfulness Metrics for Long-Form Hospital-Course Summarization [2.8575516056239576]
Long-form clinical summarization of hospital admissions has real-world significance because of its potential to help both clinicians and patients. We benchmark faithfulness metrics against fine-grained human annotations for model-generated summaries of a patient's Brief Hospital Course.
arXiv Detail & Related papers (2023-03-07T14:57:06Z)
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings. We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z)
Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching. We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.