A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities
- URL: http://arxiv.org/abs/2403.17834v1
- Date: Tue, 26 Mar 2024 16:19:56 GMT
- Title: A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities
- Authors: Ibrahim Ethem Hamamci, Sezgin Er, Furkan Almas, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Irem Dogan, Muhammed Furkan Dasdelen, Bastian Wittmann, Enis Simsar, Mehmet Simsar, Emine Bensu Erdemir, Abdullah Alanbay, Anjany Sekuboyina, Berkan Lafci, Mehmet K. Ozdemir, Bjoern Menze,
- Abstract summary: A major challenge in computational research in 3D medical imaging is the lack of comprehensive datasets.
CT-RATE is the first 3D medical imaging dataset that pairs images with textual reports.
We developed CT-CLIP, a CT-focused contrastive language-image pre-training framework.
- Score: 1.8953268281326607
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A major challenge in computational research in 3D medical imaging is the lack of comprehensive datasets. Addressing this issue, our study introduces CT-RATE, the first 3D medical imaging dataset that pairs images with textual reports. CT-RATE consists of 25,692 non-contrast chest CT volumes, expanded to 50,188 through various reconstructions, from 21,304 unique patients, along with corresponding radiology text reports. Leveraging CT-RATE, we developed CT-CLIP, a CT-focused contrastive language-image pre-training framework. As a versatile, self-supervised model, CT-CLIP is designed for broad application and does not require task-specific training. Remarkably, CT-CLIP outperforms state-of-the-art, fully supervised methods in multi-abnormality detection across all key metrics, thus eliminating the need for manual annotation. We also demonstrate its utility in case retrieval, whether using imagery or textual queries, thereby advancing knowledge dissemination. The open-source release of CT-RATE and CT-CLIP marks a significant advancement in medical AI, enhancing 3D imaging analysis and fostering innovation in healthcare.
Related papers
- CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling [12.457017701871273]
We present the first publicly available eye gaze dataset on CT, called CT-ScanGaze.<n>We then introduce CT-Searcher, a novel 3D scanpath predictor designed specifically to process CT volumes and generate radiologist-like 3D fixation sequences.
arXiv Detail & Related papers (2025-07-16T19:21:05Z) - Recurrent Visual Feature Extraction and Stereo Attentions for CT Report Generation [18.113659670915474]
We propose a large language model (LLM) based CTRG method with recurrent visual feature extraction and stereo attentions for hierarchical feature modeling.<n>Specifically, we use a vision Transformer to recurrently process each slice in a CT volume, and employ a set of attentions over the encoded slices from different perspectives to obtain important visual information.<n>Experiment results and further analysis on the benchmark M3D-Cap dataset show that our method outperforms strong baseline models.
arXiv Detail & Related papers (2025-06-24T14:29:06Z) - Towards Scalable Language-Image Pre-training for 3D Medical Imaging [49.18894445671976]
We introduce Hierarchical attention for Language-Image Pre-training (HLIP), a scalable pre-training framework for 3D medical imaging.<n>HLIP adopts a lightweight hierarchical attention mechanism inspired by the natural hierarchy of radiology data: slice, scan, and study.<n>Trained on 220K patients with 3.13 million scans for brain MRI and 240K patients with 1.44 million scans for head CT, HLIP achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-05-28T01:16:34Z) - CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering [23.158482226185217]
A visual question answering (VQA) system that can answer radiologists' questions about some anatomical regions on the CT scan is urgently needed.<n>Existing VQA systems cannot adequately handle the CT radiology question answering (CTQA) task for: (1) anatomic complexity makes CT images difficult to understand; (2) spatial relationship across hundreds slices is difficult to capture.<n>This paper proposes CT-Agent, a multimodal agentic framework for CTQA.
arXiv Detail & Related papers (2025-05-22T04:59:20Z) - A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT [67.34586036959793]
There is no fully annotated CT dataset with all anatomies delineated for training.
We propose a novel continual learning-driven CT model that can segment complete anatomies.
Our single unified CT segmentation model, CL-Net, can highly accurately segment a clinically comprehensive set of 235 fine-grained whole-body anatomies.
arXiv Detail & Related papers (2025-03-16T23:55:02Z) - 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [56.57177181778517]
RadGenome-Chest CT is a large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE.
We leverage the latest powerful universal segmentation and large language models to extend the original datasets.
arXiv Detail & Related papers (2024-04-25T17:11:37Z) - CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios [53.94122089629544]
We introduce CT-GLIP (Grounded Language-Image Pretraining with CT scans), a novel method that constructs organ-level image-text pairs to enhance multimodal contrastive learning.
Our method, trained on a multimodal CT dataset comprising 44,011 organ-level vision-text pairs from 17,702 patients across 104 organs, demonstrates it can identify organs and abnormalities in a zero-shot manner using natural languages.
arXiv Detail & Related papers (2024-04-23T17:59:01Z) - Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models [17.75505740079875]
We explore the feasibility of leveraging language as a naturally high-quality supervision for chest CT imaging.
We bootstrap the understanding of 3D chest CT images by distilling chest-related diagnostic knowledge from an extensively pre-trained 2D X-ray expert model.
We train our model with over 12,000 pairs of chest CT images and radiology reports.
arXiv Detail & Related papers (2024-04-07T12:17:40Z) - CT2Rep: Automated Radiology Report Generation for 3D Medical Imaging [0.20754235913398283]
We introduce the first method to generate radiology reports for 3D medical imaging, specifically targeting chest CT.
Given the absence of comparable methods, we establish a baseline using an advanced 3D vision encoder in medical imaging to demonstrate our method's effectiveness.
We augment CT2Rep with a cross-attention-based multi-modal fusion module and hierarchical memory, enabling the incorporation of longitudinal multimodal data.
arXiv Detail & Related papers (2024-03-11T15:17:45Z) - Multi-View Vertebra Localization and Identification from CT Images [57.56509107412658]
We propose a multi-view vertebra localization and identification from CT images.
We convert the 3D problem into a 2D localization and identification task on different views.
Our method can learn the multi-view global information naturally.
arXiv Detail & Related papers (2023-07-24T14:43:07Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Self-supervised 3D anatomy segmentation using self-distilled masked
image transformer (SMIT) [2.7298989068857487]
Self-supervised learning has demonstrated success in medical image segmentation using convolutional networks.
We show our approach is more accurate and requires fewer fine tuning datasets than other pretext tasks.
arXiv Detail & Related papers (2022-05-20T17:55:14Z) - Fed-Sim: Federated Simulation for Medical Imaging [131.56325440976207]
We introduce a physics-driven generative approach that consists of two learnable neural modules.
We show that our data synthesis framework improves the downstream segmentation performance on several datasets.
arXiv Detail & Related papers (2020-09-01T19:17:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.