Learning to Prompt Segment Anything Models
- URL: http://arxiv.org/abs/2401.04651v1
- Date: Tue, 9 Jan 2024 16:24:25 GMT
- Title: Learning to Prompt Segment Anything Models
- Authors: Jiaxing Huang, Kai Jiang, Jingyi Zhang, Han Qiu, Lewei Lu, Shijian Lu
and Eric Xing
- Abstract summary: Segment Anything Models (SAMs) have demonstrated great potential in learning to segment anything.
SAMs work with two types of prompts including spatial prompts (e.g., points) and semantic prompts (e.g., texts)
We propose spatial-semantic prompt learning (SSPrompt) that learns effective semantic and spatial prompts for better SAMs.
- Score: 55.805816693815835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segment Anything Models (SAMs) like SEEM and SAM have demonstrated great
potential in learning to segment anything. The core design of SAMs lies with
Promptable Segmentation, which takes a handcrafted prompt as input and returns
the expected segmentation mask. SAMs work with two types of prompts including
spatial prompts (e.g., points) and semantic prompts (e.g., texts), which work
together to prompt SAMs to segment anything on downstream datasets. Despite the
important role of prompts, how to acquire suitable prompts for SAMs is largely
under-explored. In this work, we examine the architecture of SAMs and identify
two challenges for learning effective prompts for SAMs. To this end, we propose
spatial-semantic prompt learning (SSPrompt) that learns effective semantic and
spatial prompts for better SAMs. Specifically, SSPrompt introduces spatial
prompt learning and semantic prompt learning, which optimize spatial prompts
and semantic prompts directly over the embedding space and selectively leverage
the knowledge encoded in pre-trained prompt encoders. Extensive experiments
show that SSPrompt achieves superior image segmentation performance
consistently across multiple widely adopted datasets.
Related papers
- SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation [88.80792308991867]
Segment Anything model (SAM) has shown ability to group image pixels into patches, but applying it to semantic-aware segmentation still faces major challenges.
This paper presents SAM-CP, a simple approach that establishes two types of composable prompts beyond SAM and composes them for versatile segmentation.
Experiments show that SAM-CP achieves semantic, instance, and panoptic segmentation in both open and closed domains.
arXiv Detail & Related papers (2024-07-23T17:47:25Z) - EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model [41.29719405544942]
We introduce the Early Vision-language Fusion-based SAM (EVF-SAM)
EVF-SAM is a simple yet effective referring segmentation method which exploits multimodal prompts (i.e., image and text)
Experiments show that the proposed EVF-SAM based on BEIT-3 can obtain state-of-the-art performance on RefCOCO/+/g for referring expression segmentation.
arXiv Detail & Related papers (2024-06-28T17:38:18Z) - AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning [61.666973416903005]
Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts.
We propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context.
arXiv Detail & Related papers (2024-06-01T16:21:39Z) - Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively [69.97238935096094]
The Open-Vocabulary SAM is a SAM-inspired model designed for simultaneous interactive segmentation and recognition.
Our method can segment and recognize approximately 22,000 classes.
arXiv Detail & Related papers (2024-01-05T18:59:22Z) - Stable Segment Anything Model [79.9005670886038]
The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts.
This paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities.
Our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality.
arXiv Detail & Related papers (2023-11-27T12:51:42Z) - SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation [65.52097667738884]
We introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation.
Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes.
In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning.
arXiv Detail & Related papers (2023-08-17T02:51:01Z) - RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation
based on Visual Foundation Model [29.42043345787285]
We propose a method to learn the generation of appropriate prompts for Segment Anything Model (SAM)
This enables SAM to produce semantically discernible segmentation results for remote sensing images.
We also propose several ongoing derivatives for instance segmentation tasks, drawing on recent advancements within the SAM community, and compare their performance with RSPrompter.
arXiv Detail & Related papers (2023-06-28T14:51:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.