RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining
- URL: http://arxiv.org/abs/2503.04653v1
- Date: Thu, 06 Mar 2025 17:43:03 GMT
- Title: RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining
- Authors: Tengfei Zhang, Ziheng Zhao, Chaoyi Wu, Xiao Zhou, Ya Zhang, Yangfeng Wang, Weidi Xie,
- Abstract summary: We propose a novel methodology that leverages dense radiology reports to define image-wise similarity ordering at multiple granularities.<n>We construct two comprehensive medical imaging retrieval datasets: MIMIC-IR for Chest X-rays and CTRATE-IR for CT scans.<n>We develop two retrieval systems, RadIR-CXR and model-ChestCT, which demonstrate superior performance in traditional image-image and image-report retrieval tasks.
- Score: 48.21287619304126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing advanced medical imaging retrieval systems is challenging due to the varying definitions of `similar images' across different medical contexts. This challenge is compounded by the lack of large-scale, high-quality medical imaging retrieval datasets and benchmarks. In this paper, we propose a novel methodology that leverages dense radiology reports to define image-wise similarity ordering at multiple granularities in a scalable and fully automatic manner. Using this approach, we construct two comprehensive medical imaging retrieval datasets: MIMIC-IR for Chest X-rays and CTRATE-IR for CT scans, providing detailed image-image ranking annotations conditioned on diverse anatomical structures. Furthermore, we develop two retrieval systems, RadIR-CXR and model-ChestCT, which demonstrate superior performance in traditional image-image and image-report retrieval tasks. These systems also enable flexible, effective image retrieval conditioned on specific anatomical structures described in text, achieving state-of-the-art results on 77 out of 78 metrics.
Related papers
- MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation [4.760537994346813]
Medical image reporting aims to generate structured clinical descriptions from radiological images.
We propose MicarVLMoE, a vision-language mixture-of-experts model with gated cross-aligned fusion.
We extend MIR to CT scans, retinal imaging, MRI scans, and gross pathology images, reporting state-of-the-art results.
arXiv Detail & Related papers (2025-04-29T01:26:02Z) - SeLIP: Similarity Enhanced Contrastive Language Image Pretraining for Multi-modal Head MRI [6.714491893348051]
We propose to develop a foundation model for multi-model head MRI by using contrastive learning on the images and the corresponding radiology findings.
Our proposed similarity enhanced contrastive language image pretraining (SeLIP) is able to effectively extract more useful features.
arXiv Detail & Related papers (2025-03-25T16:09:45Z) - MRGen: Segmentation Data Engine For Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data.
This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset [4.382166835379353]
This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions.
It is an updated version of the ROCO dataset published in 2018, and adds 35,705 new images added to PMC since 2018.
The dataset consists of 79,789 images and has been used, with minor modifications, in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023.
arXiv Detail & Related papers (2024-05-16T11:44:35Z) - VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics [0.0]
Visual attribution in medical imaging seeks to make evident the diagnostically-relevant components of a medical image.
We here present a novel generative visual attribution technique, one that leverages latent diffusion models in combination with domain-specific large language models.
The resulting system also exhibits a range of latent capabilities including zero-shot localized disease induction.
arXiv Detail & Related papers (2024-01-02T19:51:49Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - K-Space-Aware Cross-Modality Score for Synthesized Neuroimage Quality
Assessment [71.27193056354741]
The problem of how to assess cross-modality medical image synthesis has been largely unexplored.
We propose a new metric K-CROSS to spur progress on this challenging problem.
K-CROSS uses a pre-trained multi-modality segmentation network to predict the lesion location.
arXiv Detail & Related papers (2023-07-10T01:26:48Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Structurally aware bidirectional unpaired image to image translation
between CT and MR [0.14788776577018314]
Deep learning techniques can help us to leverage the possibility of an image to image translation between multiple imaging modalities.
These techniques will help to conduct surgical planning under CT with the feedback of MRI information.
arXiv Detail & Related papers (2020-06-05T11:21:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.