Visual Grounding of Whole Radiology Reports for 3D CT Images
- URL: http://arxiv.org/abs/2312.04794v1
- Date: Fri, 8 Dec 2023 02:09:17 GMT
- Title: Visual Grounding of Whole Radiology Reports for 3D CT Images
- Authors: Akimichi Ichinose, Taro Hatsutani, Keigo Nakamura, Yoshiro Kitamura,
Satoshi Iizuka, Edgar Simo-Serra, Shoji Kido, Noriyuki Tomiyama
- Abstract summary: We present the first visual grounding framework designed for CT image and report pairs covering various body parts and diverse anomaly types.
Our framework combines two components of 1) anatomical segmentation of images, and 2) report structuring.
We constructed a large-scale dataset with region-description correspondence annotations for 10,410 studies of 7,321 unique patients.
- Score: 12.071135670684013
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Building a large-scale training dataset is an essential problem in the
development of medical image recognition systems. Visual grounding techniques,
which automatically associate objects in images with corresponding
descriptions, can facilitate labeling of large number of images. However,
visual grounding of radiology reports for CT images remains challenging,
because so many kinds of anomalies are detectable via CT imaging, and resulting
report descriptions are long and complex. In this paper, we present the first
visual grounding framework designed for CT image and report pairs covering
various body parts and diverse anomaly types. Our framework combines two
components of 1) anatomical segmentation of images, and 2) report structuring.
The anatomical segmentation provides multiple organ masks of given CT images,
and helps the grounding model recognize detailed anatomies. The report
structuring helps to accurately extract information regarding the presence,
location, and type of each anomaly described in corresponding reports. Given
the two additional image/report features, the grounding model can achieve
better localization. In the verification process, we constructed a large-scale
dataset with region-description correspondence annotations for 10,410 studies
of 7,321 unique patients. We evaluated our framework using grounding accuracy,
the percentage of correctly localized anomalies, as a metric and demonstrated
that the combination of the anatomical segmentation and the report structuring
improves the performance with a large margin over the baseline model (66.0% vs
77.8%). Comparison with the prior techniques also showed higher performance of
our method.
Related papers
- CTARR: A fast and robust method for identifying anatomical regions on CT images via atlas registration [0.09130220606101362]
We introduce CTARR, a novel generic method for CT Anatomical Region Recognition.
The method serves as a pre-processing step for any deep learning-based CT image analysis pipeline.
Our proposed method is based on atlas registration and provides a fast and robust way to crop any anatomical region encoded as one or multiple bounding box(es) from any unlabeled CT scan.
arXiv Detail & Related papers (2024-10-03T08:52:21Z) - 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - Advancing Medical Image Segmentation: Morphology-Driven Learning with Diffusion Transformer [4.672688418357066]
We propose a novel Transformer Diffusion (DTS) model for robust segmentation in the presence of noise.
Our model, which analyzes the morphological representation of images, shows better results than the previous models in various medical imaging modalities.
arXiv Detail & Related papers (2024-08-01T07:35:54Z) - RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [56.57177181778517]
RadGenome-Chest CT is a large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE.
We leverage the latest powerful universal segmentation and large language models to extend the original datasets.
arXiv Detail & Related papers (2024-04-25T17:11:37Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Unsupervised correspondence with combined geometric learning and imaging
for radiotherapy applications [0.0]
The aim of this study was to develop a model to accurately identify corresponding points between organ segmentations of different patients for radiotherapy applications.
A model for simultaneous correspondence and estimation in 3D shapes was trained with head and neck organ segmentations from planning CT scans.
We then extended the original model to incorporate imaging information using two approaches.
arXiv Detail & Related papers (2023-09-25T16:29:18Z) - Multi-View Vertebra Localization and Identification from CT Images [57.56509107412658]
We propose a multi-view vertebra localization and identification from CT images.
We convert the 3D problem into a 2D localization and identification task on different views.
Our method can learn the multi-view global information naturally.
arXiv Detail & Related papers (2023-07-24T14:43:07Z) - AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv Detail & Related papers (2022-03-18T13:43:53Z) - Pathological Retinal Region Segmentation From OCT Images Using Geometric
Relation Based Augmentation [84.7571086566595]
We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape.
The proposed method outperforms state-of-the-art segmentation methods on the public RETOUCH dataset having images captured from different acquisition procedures.
arXiv Detail & Related papers (2020-03-31T11:50:43Z) - Separation of target anatomical structure and occlusions in chest
radiographs [2.0478628221188497]
We propose a Fully Convolutional Network to suppress, for a specific task, undesired visual structure from radiographs.
The proposed algorithm creates reconstructed radiographs and ground-truth data from high resolution CT-scans.
arXiv Detail & Related papers (2020-02-03T14:01:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.