CMSA-Net: Causal Multi-scale Aggregation with Adaptive Multi-source Reference for Video Polyp Segmentation
- URL: http://arxiv.org/abs/2602.22821v1
- Date: Thu, 26 Feb 2026 10:03:31 GMT
- Title: CMSA-Net: Causal Multi-scale Aggregation with Adaptive Multi-source Reference for Video Polyp Segmentation
- Authors: Tong Wang, Yaolei Qi, Siwen Wang, Imran Razzak, Guanyu Yang, Yutong Xie,
- Abstract summary: Video polyp segmentation (VPS) is an important task in computer-aided colonoscopy, as it helps doctors accurately locate and track polyps during examinations.<n>VPS remains challenging because polyps often look similar to surrounding mucosa, leading to weak semantic discrimination.<n>We propose a robust framework named CMSA-Net to address these challenges.<n>We show that CMSA-Net achieves state-of-the-art performance, offering a favorable balance between segmentation accuracy and real-time clinical applicability.
- Score: 21.421639001011993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video polyp segmentation (VPS) is an important task in computer-aided colonoscopy, as it helps doctors accurately locate and track polyps during examinations. However, VPS remains challenging because polyps often look similar to surrounding mucosa, leading to weak semantic discrimination. In addition, large changes in polyp position and scale across video frames make stable and accurate segmentation difficult. To address these challenges, we propose a robust VPS framework named CMSA-Net. The proposed network introduces a Causal Multi-scale Aggregation (CMA) module to effectively gather semantic information from multiple historical frames at different scales. By using causal attention, CMA ensures that temporal feature propagation follows strict time order, which helps reduce noise and improve feature reliability. Furthermore, we design a Dynamic Multi-source Reference (DMR) strategy that adaptively selects informative and reliable reference frames based on semantic separability and prediction confidence. This strategy provides strong multi-frame guidance while keeping the model efficient for real-time inference. Extensive experiments on the SUN-SEG dataset demonstrate that CMSA-Net achieves state-of-the-art performance, offering a favorable balance between segmentation accuracy and real-time clinical applicability.
Related papers
- FAIM: Frequency-Aware Interactive Mamba for Time Series Classification [87.84511960413715]
Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition.<n>We propose FAIM, a lightweight Frequency-Aware Interactive Mamba model.<n>We show that FAIM consistently outperforms existing state-of-the-art (SOTA) methods, achieving a superior trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2025-11-26T08:36:33Z) - AVPDN: Learning Motion-Robust and Scale-Adaptive Representations for Video-Based Polyp Detection [0.0682074616451595]
We propose the Adaptive Video Polyp Detection Network (AVPDN), a robust framework for multi-scale polyp detection in colonoscopy videos.<n> AVPDN incorporates two key components: the Adaptive Feature Interaction and Augmentation (AFIA) module and the Scale-Aware Context Integration (SACI) module.<n> Experiments conducted on several challenging public benchmarks demonstrate the effectiveness and generalization ability of the proposed method.
arXiv Detail & Related papers (2025-08-05T13:59:18Z) - PSTNet: Enhanced Polyp Segmentation with Multi-scale Alignment and Frequency Domain Integration [17.1088588766663]
Polyp Network with Shunted Transformer (PSTNet) is a novel approach that integrates both RGB and frequency domain cues present in the images.
PSTNet comprises three key modules: the Frequency characterization Attention Module (FCAM) for extracting frequency cues and capturing polyp characteristics, the Feature Supplementary Alignment Module (FSAM) for aligning semantic information and reducing noise, and the Cross Perception localization Module (CPM) for synergizing frequency cues with high-level semantics to achieve efficient polyp segmentation.
arXiv Detail & Related papers (2024-09-13T02:52:25Z) - Diff-VPS: Video Polyp Segmentation via a Multi-task Diffusion Network with Adversarial Temporal Reasoning [12.37208687991656]
We present a novel diffusion-based network for video polyp segmentation task, dubbed as Diff-VPS.
We incorporate multi-task supervision into diffusion models to promote the discrimination of diffusion models on pixel-by-pixel segmentation.
To explore the temporal dependency, Temporal Reasoning Module (TRM) is devised via reasoning and reconstructing the target frame from the previous frames.
arXiv Detail & Related papers (2024-09-11T12:51:41Z) - Multi-Source and Test-Time Domain Adaptation on Multivariate Signals using Spatio-Temporal Monge Alignment [59.75420353684495]
Machine learning applications on signals such as computer vision or biomedical data often face challenges due to the variability that exists across hardware devices or session recordings.
In this work, we propose Spatio-Temporal Monge Alignment (STMA) to mitigate these variabilities.
We show that STMA leads to significant and consistent performance gains between datasets acquired with very different settings.
arXiv Detail & Related papers (2024-07-19T13:33:38Z) - ASPS: Augmented Segment Anything Model for Polyp Segmentation [77.25557224490075]
The Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation.
SAM's Transformer-based structure prioritizes global and low-frequency information.
CFA integrates a trainable CNN encoder branch with a frozen ViT encoder, enabling the integration of domain-specific knowledge.
arXiv Detail & Related papers (2024-06-30T14:55:32Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - RetSeg: Retention-based Colorectal Polyps Segmentation Network [0.0]
Vision Transformers (ViTs) have revolutionized medical imaging analysis.
ViTs exhibit contextual awareness in processing visual data, culminating in robust and precise predictions.
We introduce RetSeg, an encoder-decoder network featuring multi-head retention blocks.
arXiv Detail & Related papers (2023-10-09T06:43:38Z) - Lesion-aware Dynamic Kernel for Polyp Segmentation [49.63274623103663]
We propose a lesion-aware dynamic network (LDNet) for polyp segmentation.
It is a traditional u-shape encoder-decoder structure incorporated with a dynamic kernel generation and updating scheme.
This simple but effective scheme endows our model with powerful segmentation performance and generalization capability.
arXiv Detail & Related papers (2023-01-12T09:53:57Z) - Adaptive Context Selection for Polyp Segmentation [99.9959901908053]
We propose an adaptive context selection based encoder-decoder framework which is composed of Local Context Attention (LCA) module, Global Context Module (GCM) and Adaptive Selection Module (ASM)
LCA modules deliver local context features from encoder layers to decoder layers, enhancing the attention to the hard region which is determined by the prediction map of previous layer.
GCM aims to further explore the global context features and send to the decoder layers. ASM is used for adaptive selection and aggregation of context features through channel-wise attention.
arXiv Detail & Related papers (2023-01-12T04:06:44Z) - Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers [124.01928050651466]
We propose a new type of polyp segmentation method, named Polyp-PVT.
The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities.
arXiv Detail & Related papers (2021-08-16T07:09:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.