Radiology-Aware Model-Based Evaluation Metric for Report Generation
- URL: http://arxiv.org/abs/2311.16764v1
- Date: Tue, 28 Nov 2023 13:08:26 GMT
- Title: Radiology-Aware Model-Based Evaluation Metric for Report Generation
- Authors: Amos Calamida, Farhad Nooralahzadeh, Morteza Rohanian, Koji Fujimoto,
Mizuho Nishio, Michael Krauthammer
- Abstract summary: We propose a new automated evaluation metric for machine-generated radiology reports using the successful COMET architecture adapted for the radiology domain.
We train and publish four medically-oriented model checkpoints, including one trained on RadGraph, a radiology knowledge graph.
Our results show that our metric correlates moderately to high with established metrics such as BERTscore, BLEU, and CheXbert scores.
- Score: 5.168471027680258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new automated evaluation metric for machine-generated radiology
reports using the successful COMET architecture adapted for the radiology
domain. We train and publish four medically-oriented model checkpoints,
including one trained on RadGraph, a radiology knowledge graph. Our results
show that our metric correlates moderately to high with established metrics
such as BERTscore, BLEU, and CheXbert scores. Furthermore, we demonstrate that
one of our checkpoints exhibits a high correlation with human judgment, as
assessed using the publicly available annotations of six board-certified
radiologists, using a set of 200 reports. We also performed our own analysis
gathering annotations with two radiologists on a collection of 100 reports. The
results indicate the potential effectiveness of our method as a
radiology-specific evaluation metric. The code, data, and model checkpoints to
reproduce our findings will be publicly available.
Related papers
- Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs [18.025481751074214]
We introduce a system, named ReXKG, which extracts structured information from processed reports to construct a radiology knowledge graph.
We conduct an in-depth comparative analysis of AI-generated and human-written radiology reports, assessing the performance of both specialist and generalist models.
arXiv Detail & Related papers (2024-08-26T16:28:56Z) - RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore)
RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions.
Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z) - LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation [37.20505633019773]
evaluating generated radiology reports is crucial for the development of radiology AI.
This study proposes a novel evaluation framework using large language models (LLMs) to compare radiology reports for assessment.
arXiv Detail & Related papers (2024-04-01T09:02:12Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Radiology-Llama2: Best-in-Class Large Language Model for Radiology [71.27700230067168]
This paper introduces Radiology-Llama2, a large language model specialized for radiology through a process known as instruction tuning.
Quantitative evaluations using ROUGE metrics on the MIMIC-CXR and OpenI datasets demonstrate that Radiology-Llama2 achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-29T17:44:28Z) - Act Like a Radiologist: Radiology Report Generation across Anatomical Regions [50.13206214694885]
X-RGen is a radiologist-minded report generation framework across six anatomical regions.
In X-RGen, we seek to mimic the behaviour of human radiologists, breaking them down into four principal phases.
We enhance the recognition capacity of the image encoder by analysing images and reports across various regions.
arXiv Detail & Related papers (2023-05-26T07:12:35Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - RadGraph: Extracting Clinical Entities and Relations from Radiology
Reports [6.419031003699479]
RadGraph is a dataset of entities and relations in full-text chest X-ray radiology reports.
Our dataset can facilitate a wide range of research in medical natural language processing, as well as computer vision and multi-modal learning when linked to chest radiographs.
arXiv Detail & Related papers (2021-06-28T08:24:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.