Stable Segment Anything Model
- URL: http://arxiv.org/abs/2311.15776v2
- Date: Tue, 5 Dec 2023 15:57:17 GMT
- Title: Stable Segment Anything Model
- Authors: Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan,
Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang
- Abstract summary: The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts.
This paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities.
Our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality.
- Score: 79.9005670886038
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The Segment Anything Model (SAM) achieves remarkable promptable segmentation
given high-quality prompts which, however, often require good skills to
specify. To make SAM robust to casual prompts, this paper presents the first
comprehensive analysis on SAM's segmentation stability across a diverse
spectrum of prompt qualities, notably imprecise bounding boxes and insufficient
points. Our key finding reveals that given such low-quality prompts, SAM's mask
decoder tends to activate image features that are biased towards the background
or confined to specific object parts. To mitigate this issue, our key idea
consists of calibrating solely SAM's mask attention by adjusting the sampling
locations and amplitudes of image features, while the original SAM model
architecture and weights remain unchanged. Consequently, our deformable
sampling plugin (DSP) enables SAM to adaptively shift attention to the prompted
target regions in a data-driven manner, facilitated by our effective robust
training strategy (RTS). During inference, dynamic routing plugin (DRP) is
proposed that toggles SAM between the deformable and regular grid sampling
modes, conditioned on the input prompt quality. Thus, our solution, termed
Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability
across a wide range of prompt qualities, while 2) retaining SAM's powerful
promptable segmentation efficiency and generality, with 3) minimal learnable
parameters (0.08 M) and fast adaptation (by 1 training epoch). Extensive
experiments across multiple datasets validate the effectiveness and advantages
of our approach, underscoring Stable-SAM as a more robust solution for
segmenting anything. Codes will be released upon acceptance.
https://github.com/fanq15/Stable-SAM
Related papers
- SAMPa: Sharpness-aware Minimization Parallelized [51.668052890249726]
Sharpness-aware (SAM) has been shown to improve the generalization of neural networks.
Each SAM update requires emphsequentially computing two gradients, effectively doubling the per-iteration cost.
We propose a simple modification of SAM, termed SAMPa, which allows us to fully parallelize the two gradient computations.
arXiv Detail & Related papers (2024-10-14T16:21:23Z) - SAM-SP: Self-Prompting Makes SAM Great Again [11.109389094334894]
Segment Anything Model (SAM) has demonstrated impressive capabilities in zero-shot segmentation tasks.
SAM encounters noticeably degradation performance when applied to specific domains, such as medical images.
We introduce a novel self-prompting based fine-tuning approach, called SAM-SP, tailored for extending the vanilla SAM model.
arXiv Detail & Related papers (2024-08-22T13:03:05Z) - Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection [58.241593208031816]
Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities.
We propose a Multi-scale and Detail-enhanced SAM (MDSAM) for Salient Object Detection (SOD)
Experimental results demonstrate the superior performance of our model on multiple SOD datasets.
arXiv Detail & Related papers (2024-08-08T09:09:37Z) - TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks [10.75125721857487]
There is still a significant performance gap between fine-tuned SAMs and domain-specific models.
We propose Two-Stream SAM (TS-SAM), which integrates the powerful features from SAM into side network training for comprehensive feature fusion.
Extensive experiments on ten public datasets from three tasks demonstrate that TS-SAM not only significantly outperforms the recently proposed SAM-Adapter and SSOM, but achieves competitive performance with the SOTA domain-specific models.
arXiv Detail & Related papers (2024-08-03T18:08:51Z) - Robust Box Prompt based SAM for Medical Image Segmentation [13.123657825272916]
We propose a novel Robust Box prompt based SAM (textbfRoBox-SAM) to ensure SAM's segmentation performance under prompts with different qualities.
First, we propose a prompt refinement module to implicitly perceive the potential targets, and output the offsets to transform the low-quality box prompt into a high-quality one.
Second, we introduce a prompt enhancement module to automatically generate point prompts to assist the box-promptable segmentation effectively.
arXiv Detail & Related papers (2024-07-31T02:16:28Z) - SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation [88.80792308991867]
Segment Anything model (SAM) has shown ability to group image pixels into patches, but applying it to semantic-aware segmentation still faces major challenges.
This paper presents SAM-CP, a simple approach that establishes two types of composable prompts beyond SAM and composes them for versatile segmentation.
Experiments show that SAM-CP achieves semantic, instance, and panoptic segmentation in both open and closed domains.
arXiv Detail & Related papers (2024-07-23T17:47:25Z) - AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning [61.666973416903005]
Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts.
We propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context.
arXiv Detail & Related papers (2024-06-01T16:21:39Z) - WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide images [8.179859593451285]
We present WSI-SAM, enhancing Segment Anything Model (SAM) with precise object segmentation capabilities for histopathology images.
To fully exploit pretrained knowledge while minimizing training overhead, we keep SAM frozen, introducing only minimal extra parameters.
Our model outperforms SAM by 4.1 and 2.5 percent points on a ductal carcinoma in situ (DCIS) segmentation tasks and breast cancer metastasis segmentation task.
arXiv Detail & Related papers (2024-03-14T10:30:43Z) - BLO-SAM: Bi-level Optimization Based Overfitting-Preventing Finetuning
of SAM [37.1263294647351]
We introduce BLO-SAM, which finetunes the Segment Anything Model (SAM) based on bi-level optimization (BLO)
BLO-SAM reduces the risk of overfitting by training the model's weight parameters and the prompt embedding on two separate subsets of the training dataset.
Results demonstrate BLO-SAM's superior performance over various state-of-the-art image semantic segmentation methods.
arXiv Detail & Related papers (2024-02-26T06:36:32Z) - BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model [65.92173280096588]
We address the challenge of image resolution variation for the Segment Anything Model (SAM)
SAM, known for its zero-shot generalizability, exhibits a performance degradation when faced with datasets with varying image sizes.
We present a bias-mode attention mask that allows each token to prioritize neighboring information.
arXiv Detail & Related papers (2024-01-04T15:34:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.