SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation
- URL: http://arxiv.org/abs/2505.21795v1
- Date: Tue, 27 May 2025 21:51:28 GMT
- Title: SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation
- Authors: Claudia Cuttano, Gabriele Trivigno, Giuseppe Averta, Carlo Masone,
- Abstract summary: Few-shot segmentation aims to segment unseen object categories from just a handful of annotated examples.<n>We propose SANSA (Semantically AligNed Segment Anything 2), a framework that makes this latent structure explicit.
- Score: 4.4700130387278225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot segmentation aims to segment unseen object categories from just a handful of annotated examples. This requires mechanisms that can both identify semantically related objects across images and accurately produce segmentation masks. We note that Segment Anything 2 (SAM2), with its prompt-and-propagate mechanism, offers both strong segmentation capabilities and a built-in feature matching process. However, we show that its representations are entangled with task-specific cues optimized for object tracking, which impairs its use for tasks requiring higher level semantic understanding. Our key insight is that, despite its class-agnostic pretraining, SAM2 already encodes rich semantic structure in its features. We propose SANSA (Semantically AligNed Segment Anything 2), a framework that makes this latent structure explicit, and repurposes SAM2 for few-shot segmentation through minimal task-specific modifications. SANSA achieves state-of-the-art performance on few-shot segmentation benchmarks specifically designed to assess generalization, outperforms generalist methods in the popular in-context setting, supports various prompts flexible interaction via points, boxes, or scribbles, and remains significantly faster and more compact than prior approaches. Code is available at https://github.com/ClaudiaCuttano/SANSA.
Related papers
- LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance [56.474856189865946]
Large multi-modal models (LMMs) struggle with inaccurate segmentation and hallucinated comprehension.<n>We propose LIRA, a framework that capitalizes on the complementary relationship between visual comprehension and segmentation.<n>LIRA achieves state-of-the-art performance in both segmentation and comprehension tasks.
arXiv Detail & Related papers (2025-07-08T07:46:26Z) - ViRefSAM: Visual Reference-Guided Segment Anything Model for Remote Sensing Segmentation [21.953205396218767]
ViRefSAM is a novel framework that guides SAM utilizing only a few annotated reference images.<n>It enables automatic segmentation of class-consistent objects across RS images.<n>It consistently outperforms existing few-shot segmentation methods across diverse datasets.
arXiv Detail & Related papers (2025-07-03T04:06:04Z) - Talk2SAM: Text-Guided Semantic Enhancement for Complex-Shaped Object Segmentation [0.0]
We propose Talk2SAM, a novel approach that integrates textual guidance to improve object segmentation.<n>The method uses CLIP-based embeddings derived from user-provided text prompts to identify relevant semantic regions.<n>Talk2SAM consistently outperforms SAM-HQ, achieving up to +5.9% IoU and +8.3% boundary IoU improvements.
arXiv Detail & Related papers (2025-06-03T19:53:10Z) - DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency [91.30252180093333]
We propose the Dual Consistency SAM (DCSAM) method based on prompttuning to adapt SAM and SAM2 for in-context segmentation.<n>Our key insights are to enhance the features of the SAM's prompt encoder in segmentation by providing high-quality visual prompts.<n>Although the proposed DC-SAM is primarily designed for images, it can be seamlessly extended to the video domain with the support SAM2.
arXiv Detail & Related papers (2025-04-16T13:41:59Z) - Evaluation Study on SAM 2 for Class-agnostic Instance-level Segmentation [2.5524809198548137]
Segment Anything Model (SAM) has demonstrated powerful zero-shot segmentation performance in natural scenes.
Recently released Segment Anything Model 2 (SAM2) has further heightened researchers' expectations towards image segmentation capabilities.
This technique report can drive the emergence of SAM2-based adapters, aiming to enhance the performance ceiling of large vision models on class-agnostic instance segmentation tasks.
arXiv Detail & Related papers (2024-09-04T09:35:09Z) - SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation [51.90445260276897]
We prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models.
We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation.
arXiv Detail & Related papers (2024-08-16T17:55:38Z) - SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation [88.80792308991867]
Segment Anything model (SAM) has shown ability to group image pixels into patches, but applying it to semantic-aware segmentation still faces major challenges.<n>This paper presents SAM-CP, a simple approach that establishes two types of composable prompts beyond SAM and composes them for versatile segmentation.<n> Experiments show that SAM-CP achieves semantic, instance, and panoptic segmentation in both open and closed domains.
arXiv Detail & Related papers (2024-07-23T17:47:25Z) - Semantic-aware SAM for Point-Prompted Instance Segmentation [29.286913777078116]
In this paper, we introduce a cost-effective category-specific segmenter using Segment Anything (SAM)
To tackle this challenge, we have devised a Semantic-Aware Instance Network (SAPNet) that integrates Multiple Instance Learning (MIL) with matching capability and SAM with point prompts.
SAPNet strategically selects the most representative mask proposals generated by SAM to supervise segmentation, with a specific focus on object category information.
arXiv Detail & Related papers (2023-12-26T05:56:44Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z) - Unifying Instance and Panoptic Segmentation with Dynamic Rank-1
Convolutions [109.2706837177222]
DR1Mask is the first panoptic segmentation framework that exploits a shared feature map for both instance and semantic segmentation.
As a byproduct, DR1Mask is 10% faster and 1 point in mAP more accurate than previous state-of-the-art instance segmentation network BlendMask.
arXiv Detail & Related papers (2020-11-19T12:42:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.