AVPDN: Learning Motion-Robust and Scale-Adaptive Representations for Video-Based Polyp Detection
- URL: http://arxiv.org/abs/2508.03458v1
- Date: Tue, 05 Aug 2025 13:59:18 GMT
- Title: AVPDN: Learning Motion-Robust and Scale-Adaptive Representations for Video-Based Polyp Detection
- Authors: Zilin Chen, Shengnan Lu,
- Abstract summary: We propose the Adaptive Video Polyp Detection Network (AVPDN), a robust framework for multi-scale polyp detection in colonoscopy videos.<n> AVPDN incorporates two key components: the Adaptive Feature Interaction and Augmentation (AFIA) module and the Scale-Aware Context Integration (SACI) module.<n> Experiments conducted on several challenging public benchmarks demonstrate the effectiveness and generalization ability of the proposed method.
- Score: 0.0682074616451595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate detection of polyps is of critical importance for the early and intermediate stages of colorectal cancer diagnosis. Compared to static images, dynamic colonoscopy videos provide more comprehensive visual information, which can facilitate the development of effective treatment plans. However, unlike fixed-camera recordings, colonoscopy videos often exhibit rapid camera movement, introducing substantial background noise that disrupts the structural integrity of the scene and increases the risk of false positives. To address these challenges, we propose the Adaptive Video Polyp Detection Network (AVPDN), a robust framework for multi-scale polyp detection in colonoscopy videos. AVPDN incorporates two key components: the Adaptive Feature Interaction and Augmentation (AFIA) module and the Scale-Aware Context Integration (SACI) module. The AFIA module adopts a triple-branch architecture to enhance feature representation. It employs dense self-attention for global context modeling, sparse self-attention to mitigate the influence of low query-key similarity in feature aggregation, and channel shuffle operations to facilitate inter-branch information exchange. In parallel, the SACI module is designed to strengthen multi-scale feature integration. It utilizes dilated convolutions with varying receptive fields to capture contextual information at multiple spatial scales, thereby improving the model's denoising capability. Experiments conducted on several challenging public benchmarks demonstrate the effectiveness and generalization ability of the proposed method, achieving competitive performance in video-based polyp detection tasks.
Related papers
- AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection [58.67129770371016]
We propose a novel IRSTD framework that reimagines the IRSTD paradigm by incorporating textual metadata for scene-aware optimization.<n>AuxDet consistently outperforms state-of-the-art methods, validating the critical role of auxiliary information in improving robustness and accuracy.
arXiv Detail & Related papers (2025-05-21T07:02:05Z) - AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent [29.891645824604684]
AgentPolyp is a novel framework integrating CLIP-based semantic guidance and dynamic image enhancement with a lightweight neural network for segmentation.<n>The framework supports plug-and-play extensions for various enhancement algorithms and segmentation networks, meeting deployment requirements for endoscopic devices.
arXiv Detail & Related papers (2025-04-15T08:39:35Z) - AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection [57.649223695021114]
We present a novel weakly supervised framework that leverages audio-visual collaboration for robust video anomaly detection.<n>Our framework demonstrates superior performance across multiple benchmarks, with audio integration significantly boosting anomaly detection accuracy.
arXiv Detail & Related papers (2025-04-06T13:59:16Z) - ASPS: Augmented Segment Anything Model for Polyp Segmentation [77.25557224490075]
The Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation.
SAM's Transformer-based structure prioritizes global and low-frequency information.
CFA integrates a trainable CNN encoder branch with a frozen ViT encoder, enabling the integration of domain-specific knowledge.
arXiv Detail & Related papers (2024-06-30T14:55:32Z) - SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation [4.027361638728112]
We propose a video polyp segmentation method that performs self-supervised learning as an auxiliary task and a spatial-temporal self-attention mechanism for improved representation learning.
Our experimental results demonstrate an improvement with respect to several state-of-the-art (SOTA) methods.
Our ablation study confirms that the choice of the proposed joint end-to-end training improves network accuracy by over 3% and nearly 10% on both the Dice similarity coefficient and intersection-over-union.
arXiv Detail & Related papers (2024-06-14T17:33:11Z) - RetSeg: Retention-based Colorectal Polyps Segmentation Network [0.0]
Vision Transformers (ViTs) have revolutionized medical imaging analysis.
ViTs exhibit contextual awareness in processing visual data, culminating in robust and precise predictions.
We introduce RetSeg, an encoder-decoder network featuring multi-head retention blocks.
arXiv Detail & Related papers (2023-10-09T06:43:38Z) - YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast
Video Polyp Detection [80.68520401539979]
textbfYONA (textbfYou textbfOnly textbfNeed one textbfAdjacent Reference-frame) is an efficient end-to-end training framework for video polyp detection.
Our proposed YONA outperforms previous state-of-the-art competitors by a large margin in both accuracy and speed.
arXiv Detail & Related papers (2023-06-06T13:53:15Z) - Lesion-aware Dynamic Kernel for Polyp Segmentation [49.63274623103663]
We propose a lesion-aware dynamic network (LDNet) for polyp segmentation.
It is a traditional u-shape encoder-decoder structure incorporated with a dynamic kernel generation and updating scheme.
This simple but effective scheme endows our model with powerful segmentation performance and generalization capability.
arXiv Detail & Related papers (2023-01-12T09:53:57Z) - Real-time automatic polyp detection in colonoscopy using feature
enhancement module and spatiotemporal similarity correlation unit [34.28382404976628]
State-of-the-art methods are based on convolutional neural networks (CNNs)
Our method combines the two-dimensional (2-D) CNN-based real-time object detector network withtemporal information.
It's demonstrated that our method provides a performance improvement in sensitivity, precision and specificity, and has great potential to be applied in clinical colonoscopy.
arXiv Detail & Related papers (2022-01-25T03:40:30Z) - Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers [124.01928050651466]
We propose a new type of polyp segmentation method, named Polyp-PVT.
The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities.
arXiv Detail & Related papers (2021-08-16T07:09:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.