Evaluating Automated Radiology Report Quality through Fine-Grained Phrasal Grounding of Clinical Findings
- URL: http://arxiv.org/abs/2412.01031v2
- Date: Sat, 07 Dec 2024 23:21:54 GMT
- Title: Evaluating Automated Radiology Report Quality through Fine-Grained Phrasal Grounding of Clinical Findings
- Authors: Razi Mahmood, Pingkun Yan, Diego Machado Reyes, Ge Wang, Mannudeep K. Kalra, Parisa Kaviani, Joy T. Wu, Tanveer Syeda-Mahmood,
- Abstract summary: We develop a new method of report quality evaluation by first extracting fine-grained finding patterns capturing the location, laterality, and severity of a large number of clinical findings.
The textual and visual measures are then combined to rate the quality of the generated reports.
- Score: 10.549145579960033
- License:
- Abstract: Several evaluation metrics have been developed recently to automatically assess the quality of generative AI reports for chest radiographs based only on textual information using lexical, semantic, or clinical named entity recognition methods. In this paper, we develop a new method of report quality evaluation by first extracting fine-grained finding patterns capturing the location, laterality, and severity of a large number of clinical findings. We then performed phrasal grounding to localize their associated anatomical regions on chest radiograph images. The textual and visual measures are then combined to rate the quality of the generated reports. We present results that compare this evaluation metric with other textual metrics on a gold standard dataset derived from the MIMIC collection and show its robustness and sensitivity to factual errors.
Related papers
- Image-aware Evaluation of Generated Medical Reports [11.190146577567548]
The paper proposes a novel evaluation metric for automatic medical report generation from X-ray images, VLScore.
The key idea of our metric is to measure the similarity between radiology reports while considering the corresponding image.
We demonstrate the benefit of our metric through evaluation on a dataset where radiologists marked errors in pairs of reports, showing notable alignment with radiologists' judgments.
arXiv Detail & Related papers (2024-10-22T18:50:20Z) - RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore)
RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions.
Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z) - Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation [10.46031380503486]
We introduce a novel method, textbfStructural textbfEntities extraction and patient indications textbfIncorporation (SEI) for chest X-ray report generation.
We employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports.
We propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications.
arXiv Detail & Related papers (2024-05-23T01:29:47Z) - Semantic Textual Similarity Assessment in Chest X-ray Reports Using a
Domain-Specific Cosine-Based Metric [1.7802147489386628]
We introduce a novel approach designed specifically for assessing the semantic similarity between generated medical reports and the ground truth.
Our approach is validated, demonstrating its efficiency in assessing domain-specific semantic similarity within medical contexts.
arXiv Detail & Related papers (2024-02-19T07:48:25Z) - Radiology-Aware Model-Based Evaluation Metric for Report Generation [5.168471027680258]
We propose a new automated evaluation metric for machine-generated radiology reports using the successful COMET architecture adapted for the radiology domain.
We train and publish four medically-oriented model checkpoints, including one trained on RadGraph, a radiology knowledge graph.
Our results show that our metric correlates moderately to high with established metrics such as BERTscore, BLEU, and CheXbert scores.
arXiv Detail & Related papers (2023-11-28T13:08:26Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - Towards objectively evaluating the quality of generated medical
summaries [70.09940409175998]
We ask evaluators to count facts, computing precision, recall, f-score, and accuracy from the raw counts.
We apply this to the task of medical report summarisation, where measuring objective quality and accuracy is of paramount importance.
arXiv Detail & Related papers (2021-04-09T15:02:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.