X-ray Made Simple: Lay Radiology Report Generation and Robust Evaluation
- URL: http://arxiv.org/abs/2406.17911v5
- Date: Fri, 14 Mar 2025 14:44:32 GMT
- Title: X-ray Made Simple: Lay Radiology Report Generation and Robust Evaluation
- Authors: Kun Zhao, Chenghao Xiao, Sixing Yan, William K. Cheung, Kai Ye, Noura Al Moubayed, Liang Zhan, Chenghua Lin,
- Abstract summary: Radiology Report Generation (RRG) has advanced considerably with the development of multimodal generative models.<n>RRG with high performance on existing lexical-based metrics might be more of a mirage - a model can get a high BLEU only by learning the template of reports.<n>We propose a semantics-based evaluation method, which is effective in mitigating the inflated numbers of BLEU and provides more robust evaluation.
- Score: 22.09740244042415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Radiology Report Generation (RRG) has advanced considerably with the development of multimodal generative models. Despite the progress, the field still faces significant challenges in evaluation, as existing metrics lack robustness and fairness. We reveal that, RRG with high performance on existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This has become a pressing issue for RRG due to the highly patternized nature of these reports. In addition, standard radiology reports are often highly technical. Helping patients understand these reports is crucial from a patient's perspective, yet this has been largely overlooked in previous work. In this work, we un-intuitively approach these problems by proposing the Layman's RRG framework that can systematically improve RRG with day-to-day language. Specifically, our framework first contributes a translated Layman's terms dataset. Building upon the dataset, we then propose a semantics-based evaluation method, which is effective in mitigating the inflated numbers of BLEU and provides more robust evaluation. We show that training on the layman's terms dataset encourages models to focus on the semantics of the reports, as opposed to overfitting to learning the report templates. Last, we reveal a promising scaling law between the number of training examples and semantics gain provided by our dataset, compared to the inverse pattern brought by the original formats.
Related papers
- Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation [54.631356899598956]
We propose a novel associative memory-enhanced X-ray report generation model that effectively mimics the process of professional doctors writing medical reports.
We employ a visual Hopfield network to establish memory associations for disease-related tokens, and a report Hopfield network to retrieve report memory information.
arXiv Detail & Related papers (2025-01-07T01:19:48Z) - HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation [89.3260120072177]
We propose a novel Historical-Constrained Large Language Models (HC-LLM) framework for Radiology report generation.
Our approach extracts both time-shared and time-specific features from longitudinal chest X-rays and diagnostic reports to capture disease progression.
Notably, our approach performs well even without historical data during testing and can be easily adapted to other multimodal large models.
arXiv Detail & Related papers (2024-12-15T06:04:16Z) - Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation [20.173287130474797]
generative medical Vision Large Language Models (VLLMs) are prone to hallucinations and can produce inaccurate diagnostic information.
We introduce a novel Semantic Consistency-Based Uncertainty Quantification framework that provides both report-level and sentence-level uncertainties.
Our approach improves factuality scores by $10$%, achieved by rejecting $20$% of reports using the textttRadialog model on the MIMIC-CXR dataset.
arXiv Detail & Related papers (2024-12-05T20:43:39Z) - LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation [1.1029725477806065]
We propose Label Boosted Retrieval Augmented Generation (LaB-RAG) to generate radiology reports.
We show that LaB-RAG achieves better results across natural language and radiology language metrics compared with other retrieval-based RRG methods.
We critique the use of a popular RRG metric, arguing it is possible to artificially inflate its results without true data-leakage.
arXiv Detail & Related papers (2024-11-25T16:10:05Z) - NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts [57.53692236201343]
We propose a Multi-Task Correction MoE, where we train the experts to become an expert'' of speech-to-text, language-to-text and vision-to-text datasets.
NeKo performs competitively on grammar and post-OCR correction as a multi-task model.
arXiv Detail & Related papers (2024-11-08T20:11:24Z) - Resource-Efficient Medical Report Generation using Large Language Models [3.2627279988912194]
Medical report generation is the task of automatically writing radiology reports for chest X-ray images.
We propose a new framework leveraging vision-enabled Large Language Models (LLM) for the task of medical report generation.
arXiv Detail & Related papers (2024-10-21T05:08:18Z) - CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset [14.911363203907008]
X-ray image-based medical report generation can significantly reduce diagnostic burdens and patient wait times.
We conduct a comprehensive benchmarking of existing mainstream X-ray report generation models and large language models (LLMs) on the CheXpert Plus dataset.
We propose a large model for the X-ray image report generation using a multi-stage pre-training strategy, including self-supervised autoregressive generation and Xray-report contrastive learning.
arXiv Detail & Related papers (2024-10-01T04:07:01Z) - GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization [13.61818620609812]
We propose a lightweight yet effective unsupervised approach called GLIMMER: a Graph and LexIcal features based unsupervised Multi-docuMEnt summaRization approach.
It first constructs a sentence graph from the source documents, then automatically identifies semantic clusters by mining low-level features from raw texts.
Experiments conducted on Multi-News, Multi-XScience and DUC-2004 demonstrate that our approach outperforms existing unsupervised approaches.
arXiv Detail & Related papers (2024-08-19T16:01:48Z) - RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore)
RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions.
Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z) - Relation Extraction with Fine-Tuned Large Language Models in Retrieval Augmented Generation Frameworks [0.0]
Relation Extraction (RE) is crucial for converting unstructured data into structured formats like Knowledge Graphs (KGs)
Recent studies leveraging pre-trained language models (PLMs) have shown significant success in this area.
This work explores the performance of fine-tuned LLMs and their integration into the Retrieval Augmented-based (RAG) RE approach.
arXiv Detail & Related papers (2024-06-20T21:27:57Z) - MRScore: Evaluating Radiology Report Generation with LLM-based Reward System [39.54237580336297]
This paper introduces MRScore, an automatic evaluation metric tailored for radiology report generation by leveraging Large Language Models (LLMs)
To address this challenge, we collaborated with radiologists to develop a framework that guides LLMs for radiology report evaluation, ensuring alignment with human analysis.
Our experiments demonstrate MRScore's higher correlation with human judgments and superior performance in model selection compared to traditional metrics.
arXiv Detail & Related papers (2024-04-27T04:42:45Z) - ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs [60.81649785463651]
We introduce ExaRanker-Open, where we adapt and explore the use of open-source language models to generate explanations.
Our findings reveal that incorporating explanations consistently enhances neural rankers, with benefits escalating as the LLM size increases.
arXiv Detail & Related papers (2024-02-09T11:23:14Z) - Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation [7.586632627817609]
Radiologists face high burnout rates, partly due to the increasing volume of Chest X-rays (CXRs) requiring interpretation and reporting.
Our proposed CXR report generator integrates elements of the workflow and introduces a novel reward for reinforcement learning.
Results from our study demonstrate that the proposed model generates reports that are more aligned with radiologists' reports than state-of-the-art models.
arXiv Detail & Related papers (2023-07-19T05:41:14Z) - Boosting Radiology Report Generation by Infusing Comparison Prior [7.054671146863795]
Recent transformer-based models have made significant strides in generating radiology reports from chest X-ray images.
These models often lack prior knowledge, resulting in the generation of synthetic reports that mistakenly reference non-existent prior exams.
We propose a novel approach that leverages a rule-based labeler to extract comparison prior information from radiology reports.
arXiv Detail & Related papers (2023-05-08T09:12:44Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z) - Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report
Generation [92.73584302508907]
We propose a knowledge graph with Dynamic structure and nodes to facilitate medical report generation with Contrastive Learning.
In detail, the fundamental structure of our graph is pre-constructed from general knowledge.
Each image feature is integrated with its very own updated graph before being fed into the decoder module for report generation.
arXiv Detail & Related papers (2023-03-18T03:53:43Z) - Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance.
Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas.
We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Automatically Generating Counterfactuals for Relation Exaction [18.740447044960796]
relation extraction (RE) is a fundamental task in natural language processing.
Current deep neural models have achieved high accuracy but are easily affected by spurious correlations.
We develop a novel approach to derive contextual counterfactuals for entities.
arXiv Detail & Related papers (2022-02-22T04:46:10Z) - Coreference Reasoning in Machine Reading Comprehension [100.75624364257429]
We show that coreference reasoning in machine reading comprehension is a greater challenge than was earlier thought.
We propose a methodology for creating reading comprehension datasets that better reflect the challenges of coreference reasoning.
This allows us to show an improvement in the reasoning abilities of state-of-the-art models across various MRC datasets.
arXiv Detail & Related papers (2020-12-31T12:18:41Z) - Learning from Context or Names? An Empirical Study on Neural Relation
Extraction [112.06614505580501]
We study the effect of two main information sources in text: textual context and entity mentions (names)
We propose an entity-masked contrastive pre-training framework for relation extraction (RE)
Our framework can improve the effectiveness and robustness of neural models in different RE scenarios.
arXiv Detail & Related papers (2020-10-05T11:21:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.