Related papers: Unlocking the Power of SAM 2 for Few-Shot Segmentation

Unlocking the Power of SAM 2 for Few-Shot Segmentation

URL: http://arxiv.org/abs/2505.14100v2
Date: Wed, 21 May 2025 01:47:37 GMT
Title: Unlocking the Power of SAM 2 for Few-Shot Segmentation
Authors: Qianxiong Xu, Lanyun Zhu, Xuanyi Liu, Guosheng Lin, Cheng Long, Ziyue Li, Rui Zhao,
Abstract summary: Few-Shot (FSS) aims to learn class-agnostic segmentation on few classes to segment arbitrary classes, but at the risk of overfitting.<n>Recently, SAM 2 has extended SAM by supporting video segmentation, whose class-agnostic matching ability is useful to FSS.<n>We design Pseudo Prompt Generator to encode pseudo query memory, matching with query features in a compatible way.<n>We further design Iterative Memory Refinement to fuse more query FG features into the memory, and devise a Support-Calibrated Memory Attention to suppress the unexpected query BG features in memory.
Score: 54.562050590453225
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Few-Shot Segmentation (FSS) aims to learn class-agnostic segmentation on few classes to segment arbitrary classes, but at the risk of overfitting. To address this, some methods use the well-learned knowledge of foundation models (e.g., SAM) to simplify the learning process. Recently, SAM 2 has extended SAM by supporting video segmentation, whose class-agnostic matching ability is useful to FSS. A simple idea is to encode support foreground (FG) features as memory, with which query FG features are matched and fused. Unfortunately, the FG objects in different frames of SAM 2's video data are always the same identity, while those in FSS are different identities, i.e., the matching step is incompatible. Therefore, we design Pseudo Prompt Generator to encode pseudo query memory, matching with query features in a compatible way. However, the memories can never be as accurate as the real ones, i.e., they are likely to contain incomplete query FG, and some unexpected query background (BG) features, leading to wrong segmentation. Hence, we further design Iterative Memory Refinement to fuse more query FG features into the memory, and devise a Support-Calibrated Memory Attention to suppress the unexpected query BG features in memory. Extensive experiments have been conducted on PASCAL-5$^i$ and COCO-20$^i$ to validate the effectiveness of our design, e.g., the 1-shot mIoU can be 4.2% better than the best baseline.

Related papers

SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation [4.4700130387278225]
Few-shot segmentation aims to segment unseen object categories from just a handful of annotated examples.<n>We propose SANSA (Semantically AligNed Segment Anything 2), a framework that makes this latent structure explicit.
arXiv Detail & Related papers (2025-05-27T21:51:28Z)
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency [91.30252180093333]
We propose the Dual Consistency SAM (DCSAM) method based on prompttuning to adapt SAM and SAM2 for in-context segmentation.<n>Our key insights are to enhance the features of the SAM's prompt encoder in segmentation by providing high-quality visual prompts.<n>Although the proposed DC-SAM is primarily designed for images, it can be seamlessly extended to the video domain with the support SAM2.
arXiv Detail & Related papers (2025-04-16T13:41:59Z)
MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation [22.482211353379927]
Large vision model, AnythingCube Model 2 (SAM2) has shown strong zero-shot segmentation performance on both images and videos.<n>Inspired by cross-frame correlation in videos, we propose to treat multi-modal data as a sequence of frames representing the same scene.<n>Our key idea is to ''memorize'' the modality-agnostic information and'memorize' the semantics related to the targeted scene.
arXiv Detail & Related papers (2025-03-09T17:33:15Z)
EdgeTAM: On-Device Track Anything Model [65.10032957471824]
Segment Anything Model (SAM) 2 further extends its capability from image to video inputs through a memory bank mechanism.<n>We aim at making SAM 2 much more efficient so that it even runs on mobile devices while maintaining a comparable performance.<n>We propose EdgeTAM, which leverages a novel 2D Spatial Perceiver to reduce the computational cost.
arXiv Detail & Related papers (2025-01-13T12:11:07Z)
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree [79.26409013413003]
We introduce SAM2Long, an improved training-free video object segmentation strategy.<n>It considers the segmentation uncertainty within each frame and chooses the video-level optimal results from multiple segmentation pathways.<n> SAM2Long achieves an average improvement of 3.0 points across all 24 head-to-head comparisons.
arXiv Detail & Related papers (2024-10-21T17:59:19Z)
Eliminating Feature Ambiguity for Few-Shot Segmentation [95.9916573435427]
Recent advancements in few-shot segmentation (FSS) have exploited pixel-by-pixel matching between query and support features. This paper presents a novel plug-in termed ambiguity elimination network (AENet), which can be plugged into any existing cross attention-based FSS methods.
arXiv Detail & Related papers (2024-07-13T10:33:03Z)
I2CANSAY:Inter-Class Analogical Augmentation and Intra-Class Significance Analysis for Non-Exemplar Online Task-Free Continual Learning [42.608860809847236]
Online task-free continual learning (OTFCL) is a more challenging variant of continual learning. Existing methods rely on a memory buffer composed of old samples to prevent forgetting. We propose a novel framework called I2CANSAY that gets rid of the dependence on memory buffers and efficiently learns the knowledge of new data from one-shot samples.
arXiv Detail & Related papers (2024-04-21T08:28:52Z)
Self-Calibrated Cross Attention Network for Few-Shot Segmentation [65.20559109791756]
We design a self-calibrated cross attention (SCCA) block for efficient patch-based attention. SCCA groups the patches from the same query image and the aligned patches from the support image as K&V. In this way, the query BG features are fused with matched BG features in support FG, and thus the aforementioned issues will be mitigated.
arXiv Detail & Related papers (2023-08-18T04:41:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.