CMaP-SAM: Contraction Mapping Prior for SAM-driven Few-shot Segmentation
- URL: http://arxiv.org/abs/2504.05049v1
- Date: Mon, 07 Apr 2025 13:19:16 GMT
- Title: CMaP-SAM: Contraction Mapping Prior for SAM-driven Few-shot Segmentation
- Authors: Shuai Chen, Fanman Meng, Haoran Wei, Chenhao Wu, Qingbo Wu, Linfeng Xu, Hongliang Li,
- Abstract summary: Few-shot segmentation (FSS) aims to segment new classes using few annotated images.<n>Recent FSS methods have shown considerable improvements by leveraging Segment Anything Model (SAM)<n>We propose CMaP-SAM, a novel framework that introduces contraction mapping theory to optimize position priors for SAM-driven FSS.
- Score: 21.466035540502226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot segmentation (FSS) aims to segment new classes using few annotated images. While recent FSS methods have shown considerable improvements by leveraging Segment Anything Model (SAM), they face two critical limitations: insufficient utilization of structural correlations in query images, and significant information loss when converting continuous position priors to discrete point prompts. To address these challenges, we propose CMaP-SAM, a novel framework that introduces contraction mapping theory to optimize position priors for SAM-driven few-shot segmentation. CMaP-SAM consists of three key components: (1) a contraction mapping module that formulates position prior optimization as a Banach contraction mapping with convergence guarantees. This module iteratively refines position priors through pixel-wise structural similarity, generating a converged prior that preserves both semantic guidance from reference images and structural correlations in query images; (2) an adaptive distribution alignment module bridging continuous priors with SAM's binary mask prompt encoder; and (3) a foreground-background decoupled refinement architecture producing accurate final segmentation masks. Extensive experiments demonstrate CMaP-SAM's effectiveness, achieving state-of-the-art performance with 71.1 mIoU on PASCAL-$5^i$ and 56.1 on COCO-$20^i$ datasets.
Related papers
- DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency [91.30252180093333]
We propose the Dual Consistency SAM (DCSAM) method based on prompttuning to adapt SAM and SAM2 for in-context segmentation.
Our key insights are to enhance the features of the SAM's prompt encoder in segmentation by providing high-quality visual prompts.
Although the proposed DC-SAM is primarily designed for images, it can be seamlessly extended to the video domain with the support SAM2.
arXiv Detail & Related papers (2025-04-16T13:41:59Z) - Effective SAM Combination for Open-Vocabulary Semantic Segmentation [24.126307031048203]
Open-vocabulary semantic segmentation aims to assign pixel-level labels to images across an unlimited range of classes.<n> ESC-Net is a novel one-stage open-vocabulary segmentation model that leverages the SAM decoder blocks for class-agnostic segmentation.<n> ESC-Net achieves superior performance on standard benchmarks, including ADE20K, PASCAL-VOC, and PASCAL-Context.
arXiv Detail & Related papers (2024-11-22T04:36:12Z) - FCC: Fully Connected Correlation for Few-Shot Segmentation [11.277022867553658]
Few-shot segmentation (FSS) aims to segment the target object in a query image using only a small set of support images and masks.
Previous methods have tried to obtain prior information by creating correlation maps from pixel-level correlation on final-layer or same-layer features.
We introduce FCC (Fully Connected Correlation) to integrate pixel-level correlations between support and query features.
arXiv Detail & Related papers (2024-11-18T03:32:02Z) - Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation [26.786890883280062]
We introduce Trident, a training-free framework that first splices features extracted by CLIP and DINO from sub-images, then leverages SAM's encoder to create a correlation matrix for global aggregation.
Trident achieves a significant improvement in the mIoU across eight benchmarks compared with the current SOTA, increasing from 44.4 to 48.6.Code.
arXiv Detail & Related papers (2024-11-14T06:31:20Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - SAMConvex: Fast Discrete Optimization for CT Registration using
Self-supervised Anatomical Embedding and Correlation Pyramid [32.424451941998484]
Estimating displacement vector field via a cost volume computed in the feature space has shown great success in image registration.
Existing feature descriptors only extract local features incapable of representing the global semantic information.
We propose SAMConvex, a fast coarse-to-fine discrete optimization method for CT registration.
arXiv Detail & Related papers (2023-07-19T02:28:41Z) - Dense Affinity Matching for Few-Shot Segmentation [83.65203917246745]
Few-Shot (FSS) aims to segment the novel class images with a few samples.
We propose a dense affinity matching framework to exploit the support-query interaction.
We show that our framework performs very competitively under different settings with only 0.68M parameters.
arXiv Detail & Related papers (2023-07-17T12:27:15Z) - Progressively Dual Prior Guided Few-shot Semantic Segmentation [57.37506990980975]
Few-shot semantic segmentation task aims at performing segmentation in query images with a few annotated support samples.
We propose a progressively dual prior guided few-shot semantic segmentation network.
arXiv Detail & Related papers (2022-11-20T16:19:47Z) - Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation [33.93192093090601]
Key challenge for few-shot semantic segmentation (FSS) is how to tailor a desirable interaction among support and query features.
We propose a prototype prototype convolution network (DPCN) to fully capture the intrinsic details for accurate FSS.
Our DPCN is also flexible and efficient under the k-shot FSS setting.
arXiv Detail & Related papers (2022-04-22T11:12:37Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z) - Kullback-Leibler Divergence-Based Fuzzy $C$-Means Clustering
Incorporating Morphological Reconstruction and Wavelet Frames for Image
Segmentation [152.609322951917]
We come up with a Kullback-Leibler (KL) divergence-based Fuzzy C-Means (FCM) algorithm by incorporating a tight wavelet frame transform and a morphological reconstruction operation.
The proposed algorithm works well and comes with better segmentation performance than other comparative algorithms.
arXiv Detail & Related papers (2020-02-21T05:19:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.