Segment Anything in High Quality
- URL: http://arxiv.org/abs/2306.01567v2
- Date: Mon, 23 Oct 2023 12:40:57 GMT
- Title: Segment Anything in High Quality
- Authors: Lei Ke, Mingqiao Ye, Martin Danelljan, Yifan Liu, Yu-Wing Tai,
Chi-Keung Tang, Fisher Yu
- Abstract summary: We propose HQ-SAM, equipping SAM with the ability to accurately segment any object, while maintaining SAM's original promptable design, efficiency, and zero-shot generalizability.
Our careful design reuses and preserves the pre-trained model weights of SAM, while only introducing minimal additional parameters and computation.
We show the efficacy of HQ-SAM in a suite of 10 diverse segmentation datasets across different downstream tasks, where 8 out of them are evaluated in a zero-shot transfer protocol.
- Score: 116.39405160133315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent Segment Anything Model (SAM) represents a big leap in scaling up
segmentation models, allowing for powerful zero-shot capabilities and flexible
prompting. Despite being trained with 1.1 billion masks, SAM's mask prediction
quality falls short in many cases, particularly when dealing with objects that
have intricate structures. We propose HQ-SAM, equipping SAM with the ability to
accurately segment any object, while maintaining SAM's original promptable
design, efficiency, and zero-shot generalizability. Our careful design reuses
and preserves the pre-trained model weights of SAM, while only introducing
minimal additional parameters and computation. We design a learnable
High-Quality Output Token, which is injected into SAM's mask decoder and is
responsible for predicting the high-quality mask. Instead of only applying it
on mask-decoder features, we first fuse them with early and final ViT features
for improved mask details. To train our introduced learnable parameters, we
compose a dataset of 44K fine-grained masks from several sources. HQ-SAM is
only trained on the introduced detaset of 44k masks, which takes only 4 hours
on 8 GPUs. We show the efficacy of HQ-SAM in a suite of 10 diverse segmentation
datasets across different downstream tasks, where 8 out of them are evaluated
in a zero-shot transfer protocol. Our code and pretrained models are at
https://github.com/SysCV/SAM-HQ.
Related papers
- MAS-SAM: Segment Any Marine Animal with Aggregated Features [55.91291540810978]
We propose a novel feature learning framework named MAS-SAM for marine animal segmentation.
Our method enables to extract richer marine information from global contextual cues to fine-grained local details.
arXiv Detail & Related papers (2024-04-24T07:38:14Z) - WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide images [8.179859593451285]
We present WSI-SAM, enhancing Segment Anything Model (SAM) with precise object segmentation capabilities for histopathology images.
To fully exploit pretrained knowledge while minimizing training overhead, we keep SAM frozen, introducing only minimal extra parameters.
Our model outperforms SAM by 4.1 and 2.5 percent points on a ductal carcinoma in situ (DCIS) segmentation tasks and breast cancer metastasis segmentation task.
arXiv Detail & Related papers (2024-03-14T10:30:43Z) - PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation [19.65118388712439]
We introduce a novel prompt-driven adapter into SAM, namely Prompt Adapter Segment Anything Model (PA-SAM)
By exclusively training the prompt adapter, PA-SAM extracts detailed information from images and optimize the mask decoder feature at both sparse and dense prompt levels.
Experimental results demonstrate that our PA-SAM outperforms other SAM-based methods in high-quality, zero-shot, and open-set segmentation.
arXiv Detail & Related papers (2024-01-23T19:20:22Z) - BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model [65.92173280096588]
We address the challenge of image resolution variation for the Segment Anything Model (SAM)
SAM, known for its zero-shot generalizability, exhibits a performance degradation when faced with datasets with varying image sizes.
We present a bias-mode attention mask that allows each token to prioritize neighboring information.
arXiv Detail & Related papers (2024-01-04T15:34:44Z) - TinySAM: Pushing the Envelope for Efficient Segment Anything Model [76.21007576954035]
We propose a framework to obtain a tiny segment anything model (TinySAM) while maintaining the strong zero-shot performance.
We first propose a full-stage knowledge distillation method with hard prompt sampling and hard mask weighting strategy to distill a lightweight student model.
We also adapt the post-training quantization to the promptable segmentation task and further reduce the computational cost.
arXiv Detail & Related papers (2023-12-21T12:26:11Z) - How to Efficiently Adapt Large Segmentation Model(SAM) to Medical Images [15.181219203629643]
Segment Anything (SAM) exhibits impressive capabilities in zero-shot segmentation for natural images.
However, when applied to medical images, SAM suffers from noticeable performance drop.
In this work, we propose to freeze SAM encoder and finetune a lightweight task-specific prediction head.
arXiv Detail & Related papers (2023-06-23T18:34:30Z) - Personalize Segment Anything Model with One Shot [52.54453744941516]
We propose a training-free Personalization approach for Segment Anything Model (SAM)
Given only a single image with a reference mask, PerSAM first localizes the target concept by a location prior.
PerSAM segments it within other images or videos via three techniques: target-guided attention, target-semantic prompting, and cascaded post-refinement.
arXiv Detail & Related papers (2023-05-04T17:59:36Z) - Improving Sharpness-Aware Minimization with Fisher Mask for Better
Generalization on Language Models [93.85178920914721]
Fine-tuning large pretrained language models on a limited training corpus usually suffers from poor computation.
We propose a novel optimization procedure, namely FSAM, which introduces a Fisher mask to improve the efficiency and performance of SAM.
We show that FSAM consistently outperforms the vanilla SAM by 0.671.98 average score among four different pretrained models.
arXiv Detail & Related papers (2022-10-11T14:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.