Related papers: Guidance in Radiology Report Summarization: An Empirical Evaluation and Error Analysis

Guidance in Radiology Report Summarization: An Empirical Evaluation and Error Analysis

URL: http://arxiv.org/abs/2307.12803v1
Date: Mon, 24 Jul 2023 13:54:37 GMT
Title: Guidance in Radiology Report Summarization: An Empirical Evaluation and Error Analysis
Authors: Jan Trienes, Paul Youssef, J\"org Schl\"otterer, Christin Seifert
Abstract summary: We propose a domain-agnostic guidance signal for summarizing radiology reports. We run an expert evaluation of four systems according to a taxonomy of 11 fine-grained errors. We find that the most pressing differences between automatic summaries and those of radiologists relate to content selection.
Score: 3.0204520109309847
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatically summarizing radiology reports into a concise impression can reduce the manual burden of clinicians and improve the consistency of reporting. Previous work aimed to enhance content selection and factuality through guided abstractive summarization. However, two key issues persist. First, current methods heavily rely on domain-specific resources to extract the guidance signal, limiting their transferability to domains and languages where those resources are unavailable. Second, while automatic metrics like ROUGE show progress, we lack a good understanding of the errors and failure modes in this task. To bridge these gaps, we first propose a domain-agnostic guidance signal in form of variable-length extractive summaries. Our empirical results on two English benchmarks demonstrate that this guidance signal improves upon unguided summarization while being competitive with domain-specific methods. Additionally, we run an expert evaluation of four systems according to a taxonomy of 11 fine-grained errors. We find that the most pressing differences between automatic summaries and those of radiologists relate to content selection including omissions (up to 52%) and additions (up to 57%). We hypothesize that latent reporting factors and corpus-level inconsistencies may limit models to reliably learn content selection from the available data, presenting promising directions for future work.

Related papers

Standardizing Longitudinal Radiology Report Evaluation via Large Language Model Annotation [10.771534459008699]
Longitudinal information in radiology reports refers to the sequential tracking of findings across multiple examinations over time.<n>There is no proper tool to consistently label temporal changes in both ground-truth and model-generated texts.<n>Existing annotation methods are typically labor-intensive, relying on the use of manual lexicons and rules.
arXiv Detail & Related papers (2026-01-23T13:57:09Z)
Reflect then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion [41.79586757544166]
Large Language Models (LLMs) show remarkable potential for few-shot information extraction (IE)<n> Conventional selection strategies often fail to provide informative guidance, as they overlook a key source of model fallibility.<n>We introduce Active Prompting for Information Extraction (APIE), a novel active prompting framework guided by a principle we term introspective confusion.
arXiv Detail & Related papers (2025-08-10T02:27:41Z)
Domain-specific Guided Summarization for Mental Health Posts [18.754472525614304]
We introduce a guided summarizer equipped with a dual-encoder and an adapted decoder. We present a post-editing correction model to rectify errors in the generated summary. Although the experiments are specifically designed for mental health posts, the methodology we've developed offers broad applicability.
arXiv Detail & Related papers (2024-11-03T08:57:41Z)
Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation [31.370503681645804]
We present a novel two-stage framework designed to extract high-quality factual statements from free-text radiology reports. Our framework also includes a new embedding-based metric ( CXRFE) for evaluating chest X-ray text generation systems.
arXiv Detail & Related papers (2024-07-02T04:39:19Z)
Investigating Distributions of Telecom Adapted Sentence Embeddings for Document Retrieval [12.135498957287004]
We evaluate embeddings obtained from publicly available models and their domain-adapted variants.<n>We establish a systematic method to obtain thresholds for similarity scores for different embeddings.<n>We show that embeddings for domain-specific sentences have little overlap with those for domain-agnostic ones.
arXiv Detail & Related papers (2024-06-18T07:03:34Z)
DEE: Dual-stage Explainable Evaluation Method for Text Generation [21.37963672432829]
We introduce DEE, a Dual-stage Explainable Evaluation method for estimating the quality of text generation. Built upon Llama 2, DEE follows a dual-stage principle guided by stage-specific instructions to perform efficient identification of errors in generated texts. The dataset concerns newly emerged issues like hallucination and toxicity, thereby broadening the scope of DEE's evaluation criteria.
arXiv Detail & Related papers (2024-03-18T06:30:41Z)
Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics. We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs. Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z)
Salience Allocation as Guidance for Abstractive Summarization [61.31826412150143]
We propose a novel summarization approach with a flexible and reliable salience guidance, namely SEASON (SaliencE Allocation as Guidance for Abstractive SummarizatiON) SEASON utilizes the allocation of salience expectation to guide abstractive summarization and adapts well to articles in different abstractiveness.
arXiv Detail & Related papers (2022-10-22T02:13:44Z)
Differentiable Multi-Agent Actor-Critic for Multi-Step Radiology Report Summarization [5.234281904315526]
The IMPRESSIONS section of a radiology report is a summary of the radiologist's reasoning and conclusions. Prior research on radiology report summarization has focused on single-step end-to-end models. We propose a two-step approach: extractive summarization followed by abstractive summarization.
arXiv Detail & Related papers (2022-03-15T21:18:09Z)
Radiology Report Generation with a Learned Knowledge Base and Multi-modal Alignment [27.111857943935725]
We present an automatic, multi-modal approach for report generation from chest x-ray. Our approach features two distinct modules: (i) Learned knowledge base and (ii) Multi-modal alignment. With the aid of both modules, our approach clearly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-12-30T10:43:56Z)
Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation [3.3978173451092437]
Radiology report generation aims at generating descriptive text from radiology images automatically. A typical setting consists of training encoder-decoder models on image-report pairs with a cross entropy loss. We propose a novel weakly supervised contrastive loss for medical report generation.
arXiv Detail & Related papers (2021-09-25T00:06:23Z)
SAIS: Supervising and Augmenting Intermediate Steps for Document-Level Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction. Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z)
Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the text. Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
SummEval: Re-evaluating Summarization Evaluation [169.622515287256]
We re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion. We benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics. We assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset.
arXiv Detail & Related papers (2020-07-24T16:25:19Z)
Exploring Explainable Selection to Control Abstractive Summarization [51.74889133688111]
We develop a novel framework that focuses on explainability. A novel pair-wise matrix captures the sentence interactions, centrality, and attribute scores. A sentence-deployed attention mechanism in the abstractor ensures the final summary emphasizes the desired content.
arXiv Detail & Related papers (2020-04-24T14:39:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.