Related papers: Learning to Prompt Segment Anything Models

Learning to Prompt Segment Anything Models

URL: http://arxiv.org/abs/2401.04651v1
Date: Tue, 9 Jan 2024 16:24:25 GMT
Title: Learning to Prompt Segment Anything Models
Authors: Jiaxing Huang, Kai Jiang, Jingyi Zhang, Han Qiu, Lewei Lu, Shijian Lu and Eric Xing
Abstract summary: Segment Anything Models (SAMs) have demonstrated great potential in learning to segment anything. SAMs work with two types of prompts including spatial prompts (e.g., points) and semantic prompts (e.g., texts) We propose spatial-semantic prompt learning (SSPrompt) that learns effective semantic and spatial prompts for better SAMs.
Score: 55.805816693815835
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Segment Anything Models (SAMs) like SEEM and SAM have demonstrated great potential in learning to segment anything. The core design of SAMs lies with Promptable Segmentation, which takes a handcrafted prompt as input and returns the expected segmentation mask. SAMs work with two types of prompts including spatial prompts (e.g., points) and semantic prompts (e.g., texts), which work together to prompt SAMs to segment anything on downstream datasets. Despite the important role of prompts, how to acquire suitable prompts for SAMs is largely under-explored. In this work, we examine the architecture of SAMs and identify two challenges for learning effective prompts for SAMs. To this end, we propose spatial-semantic prompt learning (SSPrompt) that learns effective semantic and spatial prompts for better SAMs. Specifically, SSPrompt introduces spatial prompt learning and semantic prompt learning, which optimize spatial prompts and semantic prompts directly over the embedding space and selectively leverage the knowledge encoded in pre-trained prompt encoders. Extensive experiments show that SSPrompt achieves superior image segmentation performance consistently across multiple widely adopted datasets.

Related papers

ViRefSAM: Visual Reference-Guided Segment Anything Model for Remote Sensing Segmentation [21.953205396218767]
ViRefSAM is a novel framework that guides SAM utilizing only a few annotated reference images.<n>It enables automatic segmentation of class-consistent objects across RS images.<n>It consistently outperforms existing few-shot segmentation methods across diverse datasets.
arXiv Detail & Related papers (2025-07-03T04:06:04Z)
SAM-Aware Graph Prompt Reasoning Network for Cross-Domain Few-Shot Segmentation [25.00605325290872]
We propose a SAM-aware graph prompt reasoning network (GPRN) to guide CD-FSS feature representation learning. GPRN transforms masks generated by SAM into visual prompts enriched with high-level semantic information. We show that our method establishes new state-of-the-art results.
arXiv Detail & Related papers (2024-12-31T06:38:49Z)
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation [88.80792308991867]
Segment Anything model (SAM) has shown ability to group image pixels into patches, but applying it to semantic-aware segmentation still faces major challenges. This paper presents SAM-CP, a simple approach that establishes two types of composable prompts beyond SAM and composes them for versatile segmentation. Experiments show that SAM-CP achieves semantic, instance, and panoptic segmentation in both open and closed domains.
arXiv Detail & Related papers (2024-07-23T17:47:25Z)
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model [41.29719405544942]
We introduce the Early Vision-language Fusion-based SAM (EVF-SAM) EVF-SAM is a simple yet effective referring segmentation method which exploits multimodal prompts (i.e., image and text) Experiments show that the proposed EVF-SAM based on BEIT-3 can obtain state-of-the-art performance on RefCOCO/+/g for referring expression segmentation.
arXiv Detail & Related papers (2024-06-28T17:38:18Z)
AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning [61.666973416903005]
Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts. We propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context.
arXiv Detail & Related papers (2024-06-01T16:21:39Z)
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively [69.97238935096094]
The Open-Vocabulary SAM is a SAM-inspired model designed for simultaneous interactive segmentation and recognition. Our method can segment and recognize approximately 22,000 classes.
arXiv Detail & Related papers (2024-01-05T18:59:22Z)
Stable Segment Anything Model [79.9005670886038]
The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts. This paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities. Our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality.
arXiv Detail & Related papers (2023-11-27T12:51:42Z)
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation [65.52097667738884]
We introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation. Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes. In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning.
arXiv Detail & Related papers (2023-08-17T02:51:01Z)
RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model [29.42043345787285]
We propose a method to learn the generation of appropriate prompts for Segment Anything Model (SAM) This enables SAM to produce semantically discernible segmentation results for remote sensing images. We also propose several ongoing derivatives for instance segmentation tasks, drawing on recent advancements within the SAM community, and compare their performance with RSPrompter.
arXiv Detail & Related papers (2023-06-28T14:51:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.