From SAM to DINOv2: Towards Distilling Foundation Models to Lightweight Baselines for Generalized Polyp Segmentation
- URL: http://arxiv.org/abs/2512.09307v1
- Date: Wed, 10 Dec 2025 04:25:01 GMT
- Title: From SAM to DINOv2: Towards Distilling Foundation Models to Lightweight Baselines for Generalized Polyp Segmentation
- Authors: Shivanshu Agnihotri, Snehashis Majhi, Deepak Ranjan Nayak, Debesh Jha,
- Abstract summary: Polyp-DiFoM is a framework that transfers the rich representations of foundation models into lightweight segmentation baselines.<n>It consistently outperforms baseline models significantly, as well as the state-of-the-art model, with nearly 9 times reduced overhead.
- Score: 9.452022523459886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate polyp segmentation during colonoscopy is critical for the early detection of colorectal cancer and still remains challenging due to significant size, shape, and color variations, and the camouflaged nature of polyps. While lightweight baseline models such as U-Net, U-Net++, and PraNet offer advantages in terms of easy deployment and low computational cost, they struggle to deal with the above issues, leading to limited segmentation performance. In contrast, large-scale vision foundation models such as SAM, DINOv2, OneFormer, and Mask2Former have exhibited impressive generalization performance across natural image domains. However, their direct transfer to medical imaging tasks (e.g., colonoscopic polyp segmentation) is not straightforward, primarily due to the scarcity of large-scale datasets and lack of domain-specific knowledge. To bridge this gap, we propose a novel distillation framework, Polyp-DiFoM, that transfers the rich representations of foundation models into lightweight segmentation baselines, allowing efficient and accurate deployment in clinical settings. In particular, we infuse semantic priors from the foundation models into canonical architectures such as U-Net and U-Net++ and further perform frequency domain encoding for enhanced distillation, corroborating their generalization capability. Extensive experiments are performed across five benchmark datasets, such as Kvasir-SEG, CVC-ClinicDB, ETIS, ColonDB, and CVC-300. Notably, Polyp-DiFoM consistently outperforms respective baseline models significantly, as well as the state-of-the-art model, with nearly 9 times reduced computation overhead. The code is available at https://github.com/lostinrepo/PolypDiFoM.
Related papers
- RPCANet++: Deep Interpretable Robust PCA for Sparse Object Segmentation [51.37553739930992]
RPCANet++ is a sparse object segmentation framework that fuses the interpretability of RPCA with efficient deep architectures.<n>Our approach unfolds a relaxed RPCA model into a structured network comprising a Background Approximation Module (BAM), an Object Extraction Module (OEM) and an Image Restoration Module (IRM)<n>Experiments on diverse datasets demonstrate that RPCANet++ achieves state-of-the-art performance under various imaging scenarios.
arXiv Detail & Related papers (2025-08-06T08:19:37Z) - AI-Assisted Colonoscopy: Polyp Detection and Segmentation using Foundation Models [0.10037949839020764]
In colonoscopy, 80% of the missed polyps could be detected with the help of Deep Learning models.<n>In the search for algorithms capable of addressing this challenge, foundation models emerge as promising candidates.<n>Their zero-shot or few-shot learning capabilities, facilitate generalization to new data or tasks without extensive fine-tuning.<n>A comprehensive evaluation of foundation models for polyp segmentation was conducted, assessing both detection and delimitation.
arXiv Detail & Related papers (2025-03-31T14:20:53Z) - SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation [3.075778955462259]
Polyp segmentation in colonoscopy is crucial for detecting colorectal cancer.<n>Traditional segmentation models based on Convolutional Neural Networks (CNNs) struggle to capture detailed patterns and global context.<n>We propose the Mamba-guided Segment Anything Model (SAM-Mamba) for efficient polyp segmentation.
arXiv Detail & Related papers (2024-12-11T15:47:54Z) - ASPS: Augmented Segment Anything Model for Polyp Segmentation [77.25557224490075]
The Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation.
SAM's Transformer-based structure prioritizes global and low-frequency information.
CFA integrates a trainable CNN encoder branch with a frozen ViT encoder, enabling the integration of domain-specific knowledge.
arXiv Detail & Related papers (2024-06-30T14:55:32Z) - ProMamba: Prompt-Mamba for polyp segmentation [12.008624337064521]
We propose a segmentation model based on Prompt-Mamba, which incorporates the latest Vision-Mamba and prompt technologies.
We are the first to apply the Vision-Mamba architecture to polyp segmentation and the first to utilize prompt technology in a polyp segmentation model.
Our model efficiently accomplishes segmentation tasks, surpassing previous state-of-the-art methods by an average of 5% across six datasets.
arXiv Detail & Related papers (2024-03-20T15:08:57Z) - ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic
Polyp Detection [88.4359020192429]
Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance in challenging cases.
In this paper, we propose the Enhanced CenterNet with Contrastive Learning (ECC-PolypDet), a two-stage training & end-to-end inference framework.
Box-assisted Contrastive Learning (BCL) during training to minimize the intra-class difference and maximize the inter-class difference between foreground polyps and backgrounds, enabling our model to capture concealed polyps.
In the fine-tuning stage, we introduce the IoU-guided Sample Re-weighting
arXiv Detail & Related papers (2024-01-10T07:03:41Z) - Lesion-aware Dynamic Kernel for Polyp Segmentation [49.63274623103663]
We propose a lesion-aware dynamic network (LDNet) for polyp segmentation.
It is a traditional u-shape encoder-decoder structure incorporated with a dynamic kernel generation and updating scheme.
This simple but effective scheme endows our model with powerful segmentation performance and generalization capability.
arXiv Detail & Related papers (2023-01-12T09:53:57Z) - BoxPolyp:Boost Generalized Polyp Segmentation Using Extra Coarse
Bounding Box Annotations [79.17754846553866]
We propose a boosted BoxPolyp model to make full use of both accurate mask and extra coarse box annotations.
In practice, box annotations are applied to alleviate the over-fitting issue of previous polyp segmentation models.
Our proposed model outperforms previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-12-07T07:45:50Z) - Stepwise Feature Fusion: Local Guides Global [14.394421688712052]
We propose a new State-Of-The-Art model for medical image segmentation, the SSFormer, which uses a pyramid Transformer encoder to improve the generalization ability of models.
Our proposed Progressive Locality Decoder can be adapted to the pyramid Transformer backbone to emphasize local features and attention dispersion.
arXiv Detail & Related papers (2022-03-07T10:36:38Z) - PraNet: Parallel Reverse Attention Network for Polyp Segmentation [155.93344756264824]
We propose a parallel reverse attention network (PraNet) for accurate polyp segmentation in colonoscopy images.
We first aggregate the features in high-level layers using a parallel partial decoder (PPD)
In addition, we mine the boundary cues using a reverse attention (RA) module, which is able to establish the relationship between areas and boundary cues.
arXiv Detail & Related papers (2020-06-13T08:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.