Related papers: RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

URL: http://arxiv.org/abs/2404.16754v1
Date: Thu, 25 Apr 2024 17:11:37 GMT
Title: RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis
Authors: Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Jiayu Lei, Ya Zhang, Yanfeng Wang, Weidi Xie,
Abstract summary: RadGenome-Chest CT is a large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. We leverage the latest powerful universal segmentation and large language models to extend the original datasets.
Score: 56.57177181778517
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. Specifically, we leverage the latest powerful universal segmentation and large language models, to extend the original datasets (over 25,692 non-contrast 3D chest CT volume and reports from 20,000 patients) from the following aspects: (i) organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; (ii) 665 K multi-granularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume in the form of a segmentation mask; (iii) 1.3 M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations. All grounded reports and VQA pairs in the validation set have gone through manual verification to ensure dataset quality. We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets. We will release all segmentation masks, grounded reports, and VQA pairs to facilitate further research and development in this field.

Related papers

Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach [57.86418347491272]
We propose a comprehensive hierarchical classification system, with 404 representative abnormal findings across all body regions.<n>We contribute a dataset containing over 14.5K CT images from multiple planes and all human body regions, and meticulously provide grounding annotations for over 19K abnormalities.<n>We propose OminiAbnorm-CT, which can automatically ground and describe abnormal findings on multi-plane and whole-body CT images based on text queries.
arXiv Detail & Related papers (2025-06-03T17:57:34Z)
HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation [11.180283823428711]
HWA-UNETR is a novel 3D segmentation framework that employs an original HWA block with learnable window aggregation layers.<n>Our framework surpasses existing methods by up to 1.68% in the Dice score while maintaining solid robustness.
arXiv Detail & Related papers (2025-05-15T16:18:00Z)
Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge [55.252714550918824]
AortaSeg24 MICCAI Challenge introduced the first dataset of 100 CTA volumes annotated for 23 clinically relevant aortic branches and zones. This paper presents the challenge design, dataset details, evaluation metrics, and an in-depth analysis of the top-performing algorithms.
arXiv Detail & Related papers (2025-02-07T21:09:05Z)
3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans. Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z)
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine [53.01393667775077]
This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine. It covers over 25 million images across 10 modalities, with multigranular annotations for more than 65 diseases. Unlike existing approach which is limited by the availability of image-text pairs, we have developed the first automated pipeline.
arXiv Detail & Related papers (2024-08-06T02:09:35Z)
Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography [1.8424705673580284]
We introduce CT-RATE, the first dataset that pairs 3D medical images with corresponding textual reports. We develop CT-CLIP, a CT-focused contrastive language-image pretraining framework. We create CT-CHAT, a vision-language foundational chat model for 3D chest CT volumes.
arXiv Detail & Related papers (2024-03-26T16:19:56Z)
GuideGen: A Text-Guided Framework for Full-torso Anatomy and CT Volume Generation [1.138481191622247]
GuideGen is a controllable framework that generates anatomical masks and corresponding CT volumes for the entire torso-from chest to pelvis-based on free-form text prompts. Our approach includes three core components: a text-conditional semantic synthesizer for creating realistic full-torso anatomies; a contrast-aware autoencoder for detailed, high-fidelity feature extraction across varying contrast levels; and a latent feature generator that ensures alignment between CT images, anatomical semantics and input prompts.
arXiv Detail & Related papers (2024-03-12T02:09:39Z)
Visual Grounding of Whole Radiology Reports for 3D CT Images [12.071135670684013]
We present the first visual grounding framework designed for CT image and report pairs covering various body parts and diverse anomaly types. Our framework combines two components of 1) anatomical segmentation of images, and 2) report structuring. We constructed a large-scale dataset with region-description correspondence annotations for 10,410 studies of 7,321 unique patients.
arXiv Detail & Related papers (2023-12-08T02:09:17Z)
Complex Organ Mask Guided Radiology Report Generation [13.96983438709763]
We propose the Complex Organ Mask Guided (termed as COMG) report generation model. We leverage prior knowledge of the disease corresponding to each organ in the fusion process to enhance the disease identification phase. Results on two public datasets show that COMG achieves a 11.4% and 9.7% improvement in terms of BLEU@4 scores over the SOTA model KiUT.
arXiv Detail & Related papers (2023-11-04T05:34:24Z)
C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT) C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z)
Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data [66.9359934608229]
This study aims to initiate the development of Radiology Foundation Model, termed as RadFM. To the best of our knowledge, this is the first large-scale, high-quality, medical visual-language dataset, with both 2D and 3D scans. We propose a new evaluation benchmark, RadBench, that comprises five tasks, including modality recognition, disease diagnosis, visual question answering, report generation and rationale diagnosis.
arXiv Detail & Related papers (2023-08-04T17:00:38Z)
AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules. Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv Detail & Related papers (2022-03-18T13:43:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.