Unlocking Generalization in Polyp Segmentation with DINO Self-Attention "keys"
- URL: http://arxiv.org/abs/2512.13376v1
- Date: Mon, 15 Dec 2025 14:29:47 GMT
- Title: Unlocking Generalization in Polyp Segmentation with DINO Self-Attention "keys"
- Authors: Carla Monteiro, Valentina Corbetta, Regina Beets-Tan, Luís F. Teixeira, Wilson Silva,
- Abstract summary: We present a framework that leverages the intrinsic robustness of DINO self-attention "key" features for robust segmentation.<n>Unlike traditional methods that extract tokens from the deepest layers of the Vision Transformer (ViT), our approach uses a simple convolutional decoder to predict polyp masks.<n>Our results, supported by a comprehensive statistical analysis, demonstrate that this pipeline achieves state-of-the-art (SOTA) performance.
- Score: 1.1309064441249301
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic polyp segmentation is crucial for improving the clinical identification of colorectal cancer (CRC). While Deep Learning (DL) techniques have been extensively researched for this problem, current methods frequently struggle with generalization, particularly in data-constrained or challenging settings. Moreover, many existing polyp segmentation methods rely on complex, task-specific architectures. To address these limitations, we present a framework that leverages the intrinsic robustness of DINO self-attention "key" features for robust segmentation. Unlike traditional methods that extract tokens from the deepest layers of the Vision Transformer (ViT), our approach leverages the key features of the self-attention module with a simple convolutional decoder to predict polyp masks, resulting in enhanced performance and better generalizability. We validate our approach using a multi-center dataset under two rigorous protocols: Domain Generalization (DG) and Extreme Single Domain Generalization (ESDG). Our results, supported by a comprehensive statistical analysis, demonstrate that this pipeline achieves state-of-the-art (SOTA) performance, significantly enhancing generalization, particularly in data-scarce and challenging scenarios. While avoiding a polyp-specific architecture, we surpass well-established models like nnU-Net and UM-Net. Additionally, we provide a systematic benchmark of the DINO framework's evolution, quantifying the specific impact of architectural advancements on downstream polyp segmentation performance.
Related papers
- Towards A Unified PAC-Bayesian Framework for Norm-based Generalization Bounds [63.47271262149291]
We propose a unified framework for PAC-Bayesian norm-based generalization.<n>The key to our approach is a sensitivity matrix that quantifies the network outputs with respect to structured weight perturbations.<n>We derive a family of generalization bounds that recover several existing PAC-Bayesian results as special cases.
arXiv Detail & Related papers (2026-01-13T00:42:22Z) - Enhancing Polyp Segmentation via Encoder Attention and Dynamic Kernel Update [0.0]
Polyp segmentation is a critical step in colorectal cancer detection, yet it remains challenging due to the diverse shapes, sizes, and low contrast boundaries of polyps.<n>We propose a novel framework that improves segmentation accuracy and efficiency by integrating a Dynamic Kernel (DK) mechanism with a global Attention module.
arXiv Detail & Related papers (2025-09-27T21:16:09Z) - An Entropy-Guided Curriculum Learning Strategy for Data-Efficient Acoustic Scene Classification under Domain Shift [12.42019711058722]
Acoustic Scene Classification (ASC) faces challenges in generalizing across recording devices.<n>The DCASE 2024 Challenge Task 1 highlights this issue by requiring models to learn from small labeled subsets recorded on a few devices.<n>We propose an entropy-guided curriculum learning strategy to address the domain shift problem in data-efficient ASC.
arXiv Detail & Related papers (2025-09-14T09:01:52Z) - Deepfake Detection that Generalizes Across Benchmarks [48.85953407706351]
The generalization of deepfake detectors to unseen manipulation techniques remains a challenge for practical deployment.<n>This work demonstrates that robust generalization is achievable through a parameter-efficient adaptation of one of the foundational pre-trained vision encoders.<n>The proposed method achieves state-of-the-art performance, outperforming more complex, recent approaches in average cross-dataset AUROC.
arXiv Detail & Related papers (2025-08-08T12:03:56Z) - Frequency Prior Guided Matching: A Data Augmentation Approach for Generalizable Semi-Supervised Polyp Segmentation [5.951218651336557]
polyp edges exhibit a remarkably consistent frequency signature across diverse datasets.<n>FPGM learns a domain-invariant frequency prior from the edge regions of labeled polyps.<n>It performs principled spectral perturbations on unlabeled images, aligning their amplitude spectra with this learned prior.<n>It demonstrates exceptional zero-shot generalization capabilities, achieving over 10% absolute gain in Dice score in data-scarce scenarios.
arXiv Detail & Related papers (2025-07-30T16:08:40Z) - Object Style Diffusion for Generalized Object Detection in Urban Scene [69.04189353993907]
We introduce a novel single-domain object detection generalization method, named GoDiff.<n>By integrating pseudo-target domain data with source domain data, we diversify the training dataset.<n> Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods.
arXiv Detail & Related papers (2024-12-18T13:03:00Z) - PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model [77.00221501105788]
Domain Generalization (DG) has been recently explored to improve the generalizability of point cloud classification.<n>We present the first work that studies the generalizability of state space models (SSMs) in DG PCC.<n>We propose a novel framework, PointDGMamba, that excels in strong generalizability toward unseen domains.
arXiv Detail & Related papers (2024-08-24T12:53:48Z) - ASPS: Augmented Segment Anything Model for Polyp Segmentation [77.25557224490075]
The Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation.
SAM's Transformer-based structure prioritizes global and low-frequency information.
CFA integrates a trainable CNN encoder branch with a frozen ViT encoder, enabling the integration of domain-specific knowledge.
arXiv Detail & Related papers (2024-06-30T14:55:32Z) - Edge-aware Feature Aggregation Network for Polyp Segmentation [38.11584888416297]
In this study, we present a novel Edge-aware Feature Aggregation Network (EFA-Net) for polyp segmentation.<n>EFA-Net can fully make use of cross-level and multi-scale features to enhance the performance of polyp segmentation.<n> Experimental results on five widely adopted colonoscopy datasets show that our EFA-Net outperforms state-of-the-art polyp segmentation methods in terms of generalization and effectiveness.
arXiv Detail & Related papers (2023-09-19T11:09:38Z) - NormAUG: Normalization-guided Augmentation for Domain Generalization [60.159546669021346]
We propose a simple yet effective method called NormAUG (Normalization-guided Augmentation) for deep learning.
Our method introduces diverse information at the feature level and improves the generalization of the main path.
In the test stage, we leverage an ensemble strategy to combine the predictions from the auxiliary path of our model, further boosting performance.
arXiv Detail & Related papers (2023-07-25T13:35:45Z) - Towards Lightweight Cross-domain Sequential Recommendation via External
Attention-enhanced Graph Convolution Network [7.1102362215550725]
Cross-domain Sequential Recommendation (CSR) depicts the evolution of behavior patterns for overlapped users by modeling their interactions from multiple domains.
We introduce a lightweight external attention-enhanced GCN-based framework to solve the above challenges, namely LEA-GCN.
To further alleviate the framework structure and aggregate the user-specific sequential pattern, we devise a novel dual-channel External Attention (EA) component.
arXiv Detail & Related papers (2023-02-07T03:06:29Z) - Lesion-aware Dynamic Kernel for Polyp Segmentation [49.63274623103663]
We propose a lesion-aware dynamic network (LDNet) for polyp segmentation.
It is a traditional u-shape encoder-decoder structure incorporated with a dynamic kernel generation and updating scheme.
This simple but effective scheme endows our model with powerful segmentation performance and generalization capability.
arXiv Detail & Related papers (2023-01-12T09:53:57Z) - Shape-aware Meta-learning for Generalizing Prostate MRI Segmentation to
Unseen Domains [68.73614619875814]
We present a novel shape-aware meta-learning scheme to improve the model generalization in prostate MRI segmentation.
Experimental results show that our approach outperforms many state-of-the-art generalization methods consistently across all six settings of unseen domains.
arXiv Detail & Related papers (2020-07-04T07:56:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.