Context-Gated Cross-Modal Perception with Visual Mamba for PET-CT Lung Tumor Segmentation
- URL: http://arxiv.org/abs/2510.27508v1
- Date: Fri, 31 Oct 2025 14:29:52 GMT
- Title: Context-Gated Cross-Modal Perception with Visual Mamba for PET-CT Lung Tumor Segmentation
- Authors: Elena Mulero Ayllón, Linlin Shen, Pierangelo Veltri, Fabrizia Gelardi, Arturo Chiti, Paolo Soda, Matteo Tortora,
- Abstract summary: vMambaX is a lightweight framework integrating PET and CT scan images through a Context-Gated Cross-Modal Perception Module.<n> evaluated on the PCLT20K dataset, the model outperforms baseline models while maintaining lower computational complexity.
- Score: 37.40806731129113
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate lung tumor segmentation is vital for improving diagnosis and treatment planning, and effectively combining anatomical and functional information from PET and CT remains a major challenge. In this study, we propose vMambaX, a lightweight multimodal framework integrating PET and CT scan images through a Context-Gated Cross-Modal Perception Module (CGM). Built on the Visual Mamba architecture, vMambaX adaptively enhances inter-modality feature interaction, emphasizing informative regions while suppressing noise. Evaluated on the PCLT20K dataset, the model outperforms baseline models while maintaining lower computational complexity. These results highlight the effectiveness of adaptive cross-modal gating for multimodal tumor segmentation and demonstrate the potential of vMambaX as an efficient and scalable framework for advanced lung cancer analysis. The code is available at https://github.com/arco-group/vMambaX.
Related papers
- DRBD-Mamba for Robust and Efficient Brain Tumor Segmentation with Analytical Insights [54.87947751720332]
Accurate brain tumor segmentation is significant for clinical diagnosis and treatment.<n>Mamba-based State Space Models have demonstrated promising performance.<n>We propose a dual-resolution bi-directional Mamba that captures multi-scale long-range dependencies with minimal computational overhead.
arXiv Detail & Related papers (2025-10-16T07:31:21Z) - Multimodal Slice Interaction Network Enhanced by Transfer Learning for Precise Segmentation of Internal Gross Tumor Volume in Lung Cancer PET/CT Imaging [34.37798183254656]
Internal gross tumor volume (IGTV) in PET/CT imaging is pivotal for optimal radiation therapy in lung cancer tumors.<n>We present a transfer learningbased methodology utilizing a multimodal interactive perception network with MAMBA.<n>We introduce a slice interaction module (SIM) within a 2.5D segmentation framework to effectively model inter-slice relationships.
arXiv Detail & Related papers (2025-09-26T18:48:08Z) - impuTMAE: Multi-modal Transformer with Masked Pre-training for Missing Modalities Imputation in Cancer Survival Prediction [75.43342771863837]
We introduce impuTMAE, a novel transformer-based end-to-end approach with an efficient multimodal pre-training strategy.<n>It learns inter- and intra-modal interactions while simultaneously imputing missing modalities by reconstructing masked patches.<n>Our model is pre-trained on heterogeneous, incomplete data and fine-tuned for glioma survival prediction using TCGA-GBM/LGG and BraTS datasets.
arXiv Detail & Related papers (2025-08-08T10:01:16Z) - Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images [29.523577037519985]
Deep learning models are expected to address problems such as poor image quality, motion artifacts, and complex tumor morphology.<n>We introduce a large-scale PET-CT lung tumor segmentation dataset, termed PCLT20K, which comprises 21,930 pairs of PET-CT images from 605 patients.<n>We propose a cross-modal interactive perception network with Mamba (CIPA) for lung tumor segmentation in PET-CT images.
arXiv Detail & Related papers (2025-03-21T16:04:11Z) - HC-Mamba: Vision MAMBA with Hybrid Convolutional Techniques for Medical Image Segmentation [5.318153305245246]
We propose HC-Mamba, a new medical image segmentation model based on the modern state space model Mamba.
We introduce the technique of dilated convolution in the HC-Mamba model to capture a more extensive range of contextual information.
In addition, the HC-Mamba model employs depthwise separable convolutions, significantly reducing the number of parameters and the computational power of the model.
arXiv Detail & Related papers (2024-05-08T12:24:50Z) - H2ASeg: Hierarchical Adaptive Interaction and Weighting Network for Tumor Segmentation in PET/CT Images [6.753315684414596]
Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis.
Traditional multi-modal segmentation solutions rely on concatenation operations for modality fusion.
We propose a Hierarchical Adaptive Interaction and Weighting Network termed H2ASeg to explore intrinsic cross-modal correlations.
arXiv Detail & Related papers (2024-03-27T08:28:14Z) - Cross-modality Guidance-aided Multi-modal Learning with Dual Attention
for MRI Brain Tumor Grading [47.50733518140625]
Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly.
We propose a novel cross-modality guidance-aided multi-modal learning with dual attention for addressing the task of MRI brain tumor grading.
arXiv Detail & Related papers (2024-01-17T07:54:49Z) - SwinCross: Cross-modal Swin Transformer for Head-and-Neck Tumor
Segmentation in PET/CT Images [6.936329289469511]
Cross-Modal Swin Transformer (SwinCross) with cross-modal attention (CMA) module incorporated cross-modal feature extraction at multiple resolutions.
The proposed method is experimentally shown to outperform state-of-the-art transformer-based methods.
arXiv Detail & Related papers (2023-02-08T03:36:57Z) - Improving Classification Model Performance on Chest X-Rays through Lung
Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations.
Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z) - Cross-Modality Deep Feature Learning for Brain Tumor Segmentation [158.8192041981564]
This paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data.
The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale.
Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance.
arXiv Detail & Related papers (2022-01-07T07:46:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.