Related papers: MAIRA-1: A specialised large multimodal model for radiology report generation

MAIRA-1: A specialised large multimodal model for radiology report generation

URL: http://arxiv.org/abs/2311.13668v3
Date: Fri, 26 Apr 2024 16:29:54 GMT
Title: MAIRA-1: A specialised large multimodal model for radiology report generation
Authors: Stephanie L. Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Mercy Ranjit, Anton Schwaighofer, Fernando Pérez-García, Valentina Salvatelli, Shaury Srivastav, Anja Thieme, Noel Codella, Matthew P. Lungren, Maria Teodora Wetscherek, Ozan Oktay, Javier Alvarez-Valle,
Abstract summary: We present a radiology-specific multimodal model for generating radiological reports from chest X-rays (CXRs) Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders. Our proposed model (MAIRA-1) leverages a CXR-specific image encoder in conjunction with a fine-tuned large language model based on Vicuna-7B, and text-based data augmentation, to produce reports with state-of-the-art quality.
Score: 41.69727330319648
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders. On natural images, this has been shown to allow multimodal models to gain image understanding and description capabilities. Our proposed model (MAIRA-1) leverages a CXR-specific image encoder in conjunction with a fine-tuned large language model based on Vicuna-7B, and text-based data augmentation, to produce reports with state-of-the-art quality. In particular, MAIRA-1 significantly improves on the radiologist-aligned RadCliQ metric and across all lexical metrics considered. Manual review of model outputs demonstrates promising fluency and accuracy of generated reports while uncovering failure modes not captured by existing evaluation practices. More information and resources can be found on the project website: https://aka.ms/maira.

Related papers

RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining [64.66825253356869]
We propose a novel methodology that leverages dense radiology reports to define image-wise similarity ordering at multiple granularities.<n>We construct two comprehensive medical imaging retrieval datasets: MIMIC-IR for Chest X-rays and CTRATE-IR for CT scans.<n>We develop two retrieval systems, RadIR-CXR and model-ChestCT, which demonstrate superior performance in traditional image-image and image-report retrieval tasks.
arXiv Detail & Related papers (2025-03-06T17:43:03Z)
CRRG-CLIP: Automatic Generation of Chest Radiology Reports and Classification of Chest Radiographs [2.1711205684359247]
The CRRG-CLIP Model is an end-to-end model for automated report generation and radiograph classification. The generation module uses Faster R-CNN to identify anatomical regions in radiographs, a binary classifier to select key regions, and GPT-2 to generate semantically coherent reports. The classification module uses the unsupervised Contrastive Language Image Pretraining (CLIP) model, addressing the challenges of high-cost labelled datasets.
arXiv Detail & Related papers (2024-12-31T03:07:27Z)
Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation [21.772106685777995]
We introduce a radiology-focused visual language model designed to generate radiology reports from chest X-rays. Our model combines an image encoder with a fine-tuned LLM based on the Vicuna-7B architecture, enabling it to generate different sections of a radiology report with notable accuracy.
arXiv Detail & Related papers (2024-12-06T11:14:03Z)
EVOKE: Elevating Chest X-ray Report Generation via Multi-View Contrastive Learning and Patient-Specific Knowledge [21.596462896333733]
textbfEVOKE is a novel chest X-ray report generation framework that incorporates multi-view contrastive learning and patient-specific knowledge. We present a knowledge-guided report generation module that integrates available patient-specific indications. Our proposed EVOKE surpasses recent state-of-the-art methods across multiple datasets.
arXiv Detail & Related papers (2024-11-15T14:38:13Z)
Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation [14.86028303006519]
We introduce a fact-aware multimodal retrieval-augmented pipeline in generating accurate radiology reports. We first leverage RadGraph to mine factual report pairs, then integrate factual knowledge to train a universal multimodal retriever. Experiments show that our multimodal retriever outperforms state-of-the-art retrievers on both language generation and radiology-specific metrics.
arXiv Detail & Related papers (2024-07-21T21:04:28Z)
D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions [8.50767187405446]
We propose D-Rax -- a domain-specific, conversational, radiologic assistance tool. We enhance the conversational analysis of chest X-ray (CXR) images to support radiological reporting. We observe statistically significant improvement in responses when evaluated for both open and close-ended conversations.
arXiv Detail & Related papers (2024-07-02T18:43:10Z)
Complex Organ Mask Guided Radiology Report Generation [13.96983438709763]
We propose the Complex Organ Mask Guided (termed as COMG) report generation model. We leverage prior knowledge of the disease corresponding to each organ in the fusion process to enhance the disease identification phase. Results on two public datasets show that COMG achieves a 11.4% and 9.7% improvement in terms of BLEU@4 scores over the SOTA model KiUT.
arXiv Detail & Related papers (2023-11-04T05:34:24Z)
CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images [3.0757789554622597]
This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs) For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists.
arXiv Detail & Related papers (2023-10-22T06:22:37Z)
ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations. The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations. ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z)
Radiology-Llama2: Best-in-Class Large Language Model for Radiology [71.27700230067168]
This paper introduces Radiology-Llama2, a large language model specialized for radiology through a process known as instruction tuning. Quantitative evaluations using ROUGE metrics on the MIMIC-CXR and OpenI datasets demonstrate that Radiology-Llama2 achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-29T17:44:28Z)
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model. It can analyze and answer open-ended questions about chest radiographs. We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z)
Self adaptive global-local feature enhancement for radiology report generation [10.958641951927817]
We propose a novel framework AGFNet to dynamically fuse the global and anatomy region feature to generate multi-grained radiology report. Firstly, we extract important anatomy region features and global features of input Chest X-ray (CXR) Then, with the region features and the global features as input, our proposed self-adaptive fusion gate module could dynamically fuse multi-granularity information. Finally, the captioning generator generates the radiology reports through multi-granularity features.
arXiv Detail & Related papers (2022-11-21T11:50:42Z)
Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records. The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z)
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns. ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.