CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering
- URL: http://arxiv.org/abs/2505.16229v1
- Date: Thu, 22 May 2025 04:59:20 GMT
- Title: CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering
- Authors: Yuren Mao, Wenyi Xu, Yuyang Qin, Yunjun Gao,
- Abstract summary: A visual question answering (VQA) system that can answer radiologists' questions about some anatomical regions on the CT scan is urgently needed.<n>Existing VQA systems cannot adequately handle the CT radiology question answering (CTQA) task for: (1) anatomic complexity makes CT images difficult to understand; (2) spatial relationship across hundreds slices is difficult to capture.<n>This paper proposes CT-Agent, a multimodal agentic framework for CTQA.
- Score: 23.158482226185217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computed Tomography (CT) scan, which produces 3D volumetric medical data that can be viewed as hundreds of cross-sectional images (a.k.a. slices), provides detailed anatomical information for diagnosis. For radiologists, creating CT radiology reports is time-consuming and error-prone. A visual question answering (VQA) system that can answer radiologists' questions about some anatomical regions on the CT scan and even automatically generate a radiology report is urgently needed. However, existing VQA systems cannot adequately handle the CT radiology question answering (CTQA) task for: (1) anatomic complexity makes CT images difficult to understand; (2) spatial relationship across hundreds slices is difficult to capture. To address these issues, this paper proposes CT-Agent, a multimodal agentic framework for CTQA. CT-Agent adopts anatomically independent tools to break down the anatomic complexity; furthermore, it efficiently captures the across-slice spatial relationship with a global-local token compression strategy. Experimental results on two 3D chest CT datasets, CT-RATE and RadGenome-ChestCT, verify the superior performance of CT-Agent.
Related papers
- CT-GRAPH: Hierarchical Graph Attention Network for Anatomy-Guided CT Report Generation [4.376648893167674]
We propose CT-GRAPH, a hierarchical graph attention network that explicitly models radiological knowledge.<n>Our method leverages pretrained 3D medical feature encoders to obtain global and organ-level features.<n>We show that our method achieves a substantial improvement of absolute 7.9% in F1 score over current state-of-the-art methods.
arXiv Detail & Related papers (2025-08-07T13:18:03Z) - CT-ScanGaze: A Dataset and Baselines for 3D Volumetric Scanpath Modeling [12.457017701871273]
We present the first publicly available eye gaze dataset on CT, called CT-ScanGaze.<n>We then introduce CT-Searcher, a novel 3D scanpath predictor designed specifically to process CT volumes and generate radiologist-like 3D fixation sequences.
arXiv Detail & Related papers (2025-07-16T19:21:05Z) - Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach [57.86418347491272]
We propose a comprehensive hierarchical classification system, with 404 representative abnormal findings across all body regions.<n>We contribute a dataset containing over 14.5K CT images from multiple planes and all human body regions, and meticulously provide grounding annotations for over 19K abnormalities.<n>We propose OminiAbnorm-CT, which can automatically ground and describe abnormal findings on multi-plane and whole-body CT images based on text queries.
arXiv Detail & Related papers (2025-06-03T17:57:34Z) - Imitating Radiological Scrolling: A Global-Local Attention Model for 3D Chest CT Volumes Multi-Label Anomaly Classification [0.0]
Multi-label classification of 3D CT scans is a challenging task due to the volumetric nature of the data and the variety of anomalies to be detected.<n>Existing deep learning methods based on Convolutional Neural Networks (CNNs) struggle to capture long-range dependencies effectively.<n>We present CT-Scroll, a novel global-local attention model specifically designed to emulate the scrolling behavior of radiologists during the analysis of 3D CT scans.
arXiv Detail & Related papers (2025-03-26T15:47:50Z) - RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining [64.66825253356869]
We propose a novel methodology that leverages dense radiology reports to define image-wise similarity ordering at multiple granularities.<n>We construct two comprehensive medical imaging retrieval datasets: MIMIC-IR for Chest X-rays and CTRATE-IR for CT scans.<n>We develop two retrieval systems, RadIR-CXR and model-ChestCT, which demonstrate superior performance in traditional image-image and image-report retrieval tasks.
arXiv Detail & Related papers (2025-03-06T17:43:03Z) - 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [56.57177181778517]
RadGenome-Chest CT is a large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE.
We leverage the latest powerful universal segmentation and large language models to extend the original datasets.
arXiv Detail & Related papers (2024-04-25T17:11:37Z) - Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography [10.110878689623961]
We introduce CT-RATE, the first dataset that pairs 3D medical images with corresponding textual reports.<n>We develop CT-CLIP, a CT-focused contrastive language-image pretraining framework.<n>We create CT-CHAT, a vision-language foundational chat model for 3D chest CT volumes.
arXiv Detail & Related papers (2024-03-26T16:19:56Z) - Perspective Projection-Based 3D CT Reconstruction from Biplanar X-rays [32.98966469644061]
We propose PerX2CT, a novel CT reconstruction framework from X-ray.
Our proposed method provides a different combination of features for each coordinate which implicitly allows the model to obtain information about the 3D location.
arXiv Detail & Related papers (2023-03-09T14:45:25Z) - COVIDx CT-3: A Large-scale, Multinational, Open-Source Benchmark Dataset
for Computer-aided COVID-19 Screening from Chest CT Images [82.74877848011798]
We introduce COVIDx CT-3, a large-scale benchmark dataset for detection of COVID-19 cases from chest CT images.
COVIDx CT-3 includes 431,205 CT slices from 6,068 patients across at least 17 countries.
We examine the data diversity and potential biases of the COVIDx CT-3 dataset, finding significant geographic and class imbalances.
arXiv Detail & Related papers (2022-06-07T06:35:48Z) - A unified 3D framework for Organs at Risk Localization and Segmentation
for Radiation Therapy Planning [56.52933974838905]
Current medical workflow requires manual delineation of organs-at-risk (OAR)
In this work, we aim to introduce a unified 3D pipeline for OAR localization-segmentation.
Our proposed framework fully enables the exploitation of 3D context information inherent in medical imaging.
arXiv Detail & Related papers (2022-03-01T17:08:41Z) - XraySyn: Realistic View Synthesis From a Single Radiograph Through CT
Priors [118.27130593216096]
A radiograph visualizes the internal anatomy of a patient through the use of X-ray, which projects 3D information onto a 2D plane.
To the best of our knowledge, this is the first work on radiograph view synthesis.
We show that by gaining an understanding of radiography in 3D space, our method can be applied to radiograph bone extraction and suppression without groundtruth bone labels.
arXiv Detail & Related papers (2020-12-04T05:08:53Z) - Deep Reinforcement Learning for Organ Localization in CT [59.23083161858951]
We propose a deep reinforcement learning approach for organ localization in CT.
In this work, an artificial agent is actively self-taught to localize organs in CT by learning from its asserts and mistakes.
Our method can use as a plug-and-play module for localizing any organ of interest.
arXiv Detail & Related papers (2020-05-11T10:06:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.