RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis
- URL: http://arxiv.org/abs/2404.16754v1
- Date: Thu, 25 Apr 2024 17:11:37 GMT
- Title: RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis
- Authors: Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya Zhang, Yanfeng Wang, Weidi Xie,
- Abstract summary: RadGenome-Chest CT is a large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE.
We leverage the latest powerful universal segmentation and large language models to extend the original datasets.
- Score: 56.57177181778517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. Specifically, we leverage the latest powerful universal segmentation and large language models, to extend the original datasets (over 25,692 non-contrast 3D chest CT volume and reports from 20,000 patients) from the following aspects: (i) organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; (ii) 665 K multi-granularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume in the form of a segmentation mask; (iii) 1.3 M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations. All grounded reports and VQA pairs in the validation set have gone through manual verification to ensure dataset quality. We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets. We will release all segmentation masks, grounded reports, and VQA pairs to facilitate further research and development in this field.
Related papers
- 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine [53.01393667775077]
This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine.
It covers over 25 million images across 10 modalities, with multigranular annotations for more than 65 diseases.
Unlike existing approach which is limited by the availability of image-text pairs, we have developed the first automated pipeline.
arXiv Detail & Related papers (2024-08-06T02:09:35Z) - Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography [1.8424705673580284]
We introduce CT-RATE, the first dataset that pairs 3D medical images with corresponding textual reports.
We develop CT-CLIP, a CT-focused contrastive language-image pretraining framework.
We create CT-CHAT, a vision-language foundational chat model for 3D chest CT volumes.
arXiv Detail & Related papers (2024-03-26T16:19:56Z) - GuideGen: A Text-Guided Framework for Full-torso Anatomy and CT Volume Generation [1.138481191622247]
GuideGen is a controllable framework that generates anatomical masks and corresponding CT volumes for the entire torso-from chest to pelvis-based on free-form text prompts.
Our approach includes three core components: a text-conditional semantic synthesizer for creating realistic full-torso anatomies; a contrast-aware autoencoder for detailed, high-fidelity feature extraction across varying contrast levels; and a latent feature generator that ensures alignment between CT images, anatomical semantics and input prompts.
arXiv Detail & Related papers (2024-03-12T02:09:39Z) - Visual Grounding of Whole Radiology Reports for 3D CT Images [12.071135670684013]
We present the first visual grounding framework designed for CT image and report pairs covering various body parts and diverse anomaly types.
Our framework combines two components of 1) anatomical segmentation of images, and 2) report structuring.
We constructed a large-scale dataset with region-description correspondence annotations for 10,410 studies of 7,321 unique patients.
arXiv Detail & Related papers (2023-12-08T02:09:17Z) - Complex Organ Mask Guided Radiology Report Generation [13.96983438709763]
We propose the Complex Organ Mask Guided (termed as COMG) report generation model.
We leverage prior knowledge of the disease corresponding to each organ in the fusion process to enhance the disease identification phase.
Results on two public datasets show that COMG achieves a 11.4% and 9.7% improvement in terms of BLEU@4 scores over the SOTA model KiUT.
arXiv Detail & Related papers (2023-11-04T05:34:24Z) - C^2M-DoT: Cross-modal consistent multi-view medical report generation
with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT)
C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z) - Towards Generalist Foundation Model for Radiology by Leveraging
Web-scale 2D&3D Medical Data [66.9359934608229]
This study aims to initiate the development of Radiology Foundation Model, termed as RadFM.
To the best of our knowledge, this is the first large-scale, high-quality, medical visual-language dataset, with both 2D and 3D scans.
We propose a new evaluation benchmark, RadBench, that comprises five tasks, including modality recognition, disease diagnosis, visual question answering, report generation and rationale diagnosis.
arXiv Detail & Related papers (2023-08-04T17:00:38Z) - AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv Detail & Related papers (2022-03-18T13:43:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.