Related papers: OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

URL: http://arxiv.org/abs/2602.16110v1
Date: Wed, 18 Feb 2026 00:42:41 GMT
Title: OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis
Authors: Tianwei Lin, Zhongwei Qiu, Wenqiao Zhang, Jiang Liu, Yihan Xie, Mingjian Gao, Zhenxuan Fan, Zhaocheng Li, Sijing Li, Zhongle Xie, Peng LU, Yueting Zhuang, Yingda Xia, Ling Zhang, Beng Chin Ooi,
Abstract summary: Clinical interpretation relies on both slice-driven local features and volume-driven spatial representations.<n>Existing Large Vision-Language Models (LVLMs) remain fragmented in CT slice versus volumetric understanding.<n>We present OmniCT, a powerful unified slice-volume LVLM for CT scenarios.
Score: 53.01523944168442
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computed Tomography (CT) is one of the most widely used and diagnostically information-dense imaging modalities, covering critical organs such as the heart, lungs, liver, and colon. Clinical interpretation relies on both slice-driven local features (e.g., sub-centimeter nodules, lesion boundaries) and volume-driven spatial representations (e.g., tumor infiltration, inter-organ anatomical relations). However, existing Large Vision-Language Models (LVLMs) remain fragmented in CT slice versus volumetric understanding: slice-driven LVLMs show strong generalization but lack cross-slice spatial consistency, while volume-driven LVLMs explicitly capture volumetric semantics but suffer from coarse granularity and poor compatibility with slice inputs. The absence of a unified modeling paradigm constitutes a major bottleneck for the clinical translation of medical LVLMs. We present OmniCT, a powerful unified slice-volume LVLM for CT scenarios, which makes three contributions: (i) Spatial Consistency Enhancement (SCE): volumetric slice composition combined with tri-axial positional embedding that introduces volumetric consistency, and an MoE hybrid projection enables efficient slice-volume adaptation; (ii) Organ-level Semantic Enhancement (OSE): segmentation and ROI localization explicitly align anatomical regions, emphasizing lesion- and organ-level semantics; (iii) MedEval-CT: the largest slice-volume CT dataset and hybrid benchmark integrates comprehensive metrics for unified evaluation. OmniCT consistently outperforms existing methods with a substantial margin across diverse clinical tasks and satisfies both micro-level detail sensitivity and macro-level spatial reasoning. More importantly, it establishes a new paradigm for cross-modal medical imaging understanding.

Related papers

Deep-Learning Atlas Registration for Melanoma Brain Metastases: Preserving Pathology While Enabling Cohort-Level Analyses [0.7969462887653364]
Melanoma brain metastases (MBM) are common and spatially heterogeneous lesions.<n>We propose a deformable registration framework that aligns individual pathological brains to a common atlas.
arXiv Detail & Related papers (2026-02-13T13:43:57Z)
Zero-shot System for Automatic Body Region Detection for Volumetric CT and MR Images [0.0]
We investigate whether body region detection in CT and MR images can be achieved in a fully zero-shot manner by using knowledge embedded in large pre-trained foundation models.<n>We propose and systematically evaluate three training-free pipelines: (1) a segmentation-driven rule-based system, (2) a Multimodal Large Language Model (MLLM) guided by radiologist-defined rules, and (3) a segmentation-aware MLLM that combines visual input with explicit anatomical evidence.<n>All methods are evaluated on 887 heterogeneous CT and MR scans with manually verified anatomical region labels.
arXiv Detail & Related papers (2026-02-09T14:26:24Z)
Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z)
TGC-Net: A Structure-Aware and Semantically-Aligned Framework for Text-Guided Medical Image Segmentation [56.09179939570486]
We propose TGC-Net, a CLIP-based framework focusing on parameter-efficient, task-specific adaptations.<n>TGC-Net achieves state-of-the-art performance with substantially fewer trainable parameters, including notable Dice gains on challenging benchmarks.
arXiv Detail & Related papers (2025-12-24T12:06:26Z)
Multimodal Fusion at Three Tiers: Physics-Driven Data Generation and Vision-Language Guidance for Brain Tumor Segmentation [8.695435245976482]
This paper proposes a three-tier fusion architecture that achieves precise brain tumor segmentation.<n>The method processes information progressively at the pixel, feature, and semantic levels.<n>We validated the method on the Brain Tumor (BraTS) 2020, 2021 and 2023 datasets.
arXiv Detail & Related papers (2025-07-14T06:32:59Z)
MOSAIC: A Multi-View 2.5D Organ Slice Selector with Cross-Attentional Reasoning for Anatomically-Aware CT Localization in Medical Organ Segmentation [0.8747606955991707]
Existing 3D segmentation approaches are computationally and memory intensive, often processing entire volumes that contain many anatomically irrelevant slices.<n>We propose a novel, anatomically-aware slice selector pipeline that reduces input volume prior to segmentation.<n>Our proposed model acts as an "expert" in anatomical localization, reasoning over multi-view representations to selectively retain slices with high structural relevance.
arXiv Detail & Related papers (2025-05-15T19:32:28Z)
Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation [48.107348956719775]
We introduce Mask-Enhanced SAM (M-SAM), an innovative architecture tailored for 3D tumor lesion segmentation. We propose a novel Mask-Enhanced Adapter (MEA) within M-SAM that enriches the semantic information of medical images with positional data from coarse segmentation masks. Our M-SAM achieves high segmentation accuracy and also exhibits robust generalization.
arXiv Detail & Related papers (2024-03-09T13:37:02Z)
Large-Kernel Attention for 3D Medical Image Segmentation [14.76728117630242]
In this paper, a novel large- kernel (LK) attention module is proposed to achieve accurate multi-organ segmentation and tumor segmentation. The advantages of convolution and self-attention are combined in the proposed LK attention module, including local contextual information, long-range dependence, and channel adaptation. The module also decomposes the LK convolution to optimize the computational cost and can be easily incorporated into FCNs such as U-Net.
arXiv Detail & Related papers (2022-07-19T16:32:55Z)
Superficial White Matter Analysis: An Efficient Point-cloud-based Deep Learning Framework with Supervised Contrastive Learning for Consistent Tractography Parcellation across Populations and dMRI Acquisitions [68.41088365582831]
White matter parcellation classifies tractography streamlines into clusters or anatomically meaningful tracts. Most parcellation methods focus on the deep white matter (DWM), whereas fewer methods address the superficial white matter (SWM) due to its complexity. We propose a novel two-stage deep-learning-based framework, Superficial White Matter Analysis (SupWMA), that performs an efficient parcellation of 198 SWM clusters from whole-brain tractography.
arXiv Detail & Related papers (2022-07-18T23:07:53Z)
Decoupled Pyramid Correlation Network for Liver Tumor Segmentation from CT images [22.128902125820193]
We propose a Decoupled Pyramid Correlation Network (DPC-Net) It exploits attention mechanisms to fully leverage both low and high-level features embedded in FCN to segment liver tumor. It achieves a competitive results with a DSC of 96.2% and an ASSD of 1.636 mm for liver segmentation.
arXiv Detail & Related papers (2022-05-26T07:31:29Z)
Incremental Cross-view Mutual Distillation for Self-supervised Medical CT Synthesis [88.39466012709205]
This paper builds a novel medical slice to increase the between-slice resolution. Considering that the ground-truth intermediate medical slices are always absent in clinical practice, we introduce the incremental cross-view mutual distillation strategy. Our method outperforms state-of-the-art algorithms by clear margins.
arXiv Detail & Related papers (2021-12-20T03:38:37Z)
Symmetry-Enhanced Attention Network for Acute Ischemic Infarct Segmentation with Non-Contrast CT Images [50.55978219682419]
We propose a symmetry enhanced attention network (SEAN) for acute ischemic infarct segmentation. Our proposed network automatically transforms an input CT image into the standard space where the brain tissue is bilaterally symmetric. The proposed SEAN outperforms some symmetry-based state-of-the-art methods in terms of both dice coefficient and infarct localization.
arXiv Detail & Related papers (2021-10-11T07:13:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.