Image-aware Evaluation of Generated Medical Reports
- URL: http://arxiv.org/abs/2410.17357v1
- Date: Tue, 22 Oct 2024 18:50:20 GMT
- Title: Image-aware Evaluation of Generated Medical Reports
- Authors: Gefen Dawidowicz, Elad Hirsch, Ayellet Tal,
- Abstract summary: The paper proposes a novel evaluation metric for automatic medical report generation from X-ray images, VLScore.
The key idea of our metric is to measure the similarity between radiology reports while considering the corresponding image.
We demonstrate the benefit of our metric through evaluation on a dataset where radiologists marked errors in pairs of reports, showing notable alignment with radiologists' judgments.
- Score: 11.190146577567548
- License:
- Abstract: The paper proposes a novel evaluation metric for automatic medical report generation from X-ray images, VLScore. It aims to overcome the limitations of existing evaluation methods, which either focus solely on textual similarities, ignoring clinical aspects, or concentrate only on a single clinical aspect, the pathology, neglecting all other factors. The key idea of our metric is to measure the similarity between radiology reports while considering the corresponding image. We demonstrate the benefit of our metric through evaluation on a dataset where radiologists marked errors in pairs of reports, showing notable alignment with radiologists' judgments. In addition, we provide a new dataset for evaluating metrics. This dataset includes well-designed perturbations that distinguish between significant modifications (e.g., removal of a diagnosis) and insignificant ones. It highlights the weaknesses in current evaluation metrics and provides a clear framework for analysis.
Related papers
- RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore)
RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions.
Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z) - Radiology-Aware Model-Based Evaluation Metric for Report Generation [5.168471027680258]
We propose a new automated evaluation metric for machine-generated radiology reports using the successful COMET architecture adapted for the radiology domain.
We train and publish four medically-oriented model checkpoints, including one trained on RadGraph, a radiology knowledge graph.
Our results show that our metric correlates moderately to high with established metrics such as BERTscore, BLEU, and CheXbert scores.
arXiv Detail & Related papers (2023-11-28T13:08:26Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - Robust Detection Outcome: A Metric for Pathology Detection in Medical
Images [6.667150890634173]
Robust Detection Outcome (RoDeO) is a novel metric for evaluating algorithms for pathology detection in medical images.
RoDeO evaluates different errors directly and individually, and reflects clinical needs better than current metrics.
arXiv Detail & Related papers (2023-03-03T13:45:13Z) - Ontology-aware Learning and Evaluation for Audio Tagging [56.59107110017436]
Mean average precision (mAP) metric treats different kinds of sound as independent classes without considering their relations.
Ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.
We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP.
arXiv Detail & Related papers (2022-11-22T11:35:14Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation [3.3978173451092437]
Radiology report generation aims at generating descriptive text from radiology images automatically.
A typical setting consists of training encoder-decoder models on image-report pairs with a cross entropy loss.
We propose a novel weakly supervised contrastive loss for medical report generation.
arXiv Detail & Related papers (2021-09-25T00:06:23Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.