Related papers: EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

URL: http://arxiv.org/abs/2312.00863v1
Date: Fri, 1 Dec 2023 18:31:00 GMT
Title: EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Authors: Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra
Abstract summary: Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. We propose EfficientSAMs, light-weight SAM models that exhibits decent performance with largely reduced complexity. Our idea is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning.
Score: 36.553867358541154
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, the huge computation cost of SAM model has limited its applications to wider real-world applications. To address this limitation, we propose EfficientSAMs, light-weight SAM models that exhibits decent performance with largely reduced complexity. Our idea is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning. Further, we take SAMI-pretrained light-weight image encoders and mask decoder to build EfficientSAMs, and finetune the models on SA-1B for segment anything task. We perform evaluations on multiple vision tasks including image classification, object detection, instance segmentation, and semantic object detection, and find that our proposed pretraining method, SAMI, consistently outperforms other masked image pretraining methods. On segment anything task such as zero-shot instance segmentation, our EfficientSAMs with SAMI-pretrained lightweight image encoders perform favorably with a significant gain (e.g., ~4 AP on COCO/LVIS) over other fast SAM models.

Related papers

Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments. We propose UOIS-SAM, a data-efficient solution for the UOIS task. UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z)
Tuning a SAM-Based Model with Multi-Cognitive Visual Adapter to Remote Sensing Instance Segmentation [4.6570959687411975]
The Segment Anything Model (SAM) demonstrates exceptional generalization capabilities. SAM's lack of pretraining on massive remote sensing images and its interactive structure limit its automatic mask prediction capabilities. A Multi- cognitive SAM-Based Instance Model (MC-SAM SEG) is introduced to employ SAM on remote sensing domain. The proposed method named MC-SAM SEG extracts high-quality features by fine-tuning the SAM-Mona encoder along with a feature aggregator.
arXiv Detail & Related papers (2024-08-16T07:23:22Z)
Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection [58.241593208031816]
Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. We propose a Multi-scale and Detail-enhanced SAM (MDSAM) for Salient Object Detection (SOD) Experimental results demonstrate the superior performance of our model on multiple SOD datasets.
arXiv Detail & Related papers (2024-08-08T09:09:37Z)
RobustSAM: Segment Anything Robustly on Degraded Images [19.767828436963317]
Segment Anything Model (SAM) has emerged as a transformative approach in image segmentation. We propose the Robust Segment Anything Model (RobustSAM), which enhances SAM's performance on low-quality images. Our method has been shown to effectively improve the performance of SAM-based downstream tasks such as single image dehazing and deblurring.
arXiv Detail & Related papers (2024-06-13T23:33:59Z)
MAS-SAM: Segment Any Marine Animal with Aggregated Features [55.91291540810978]
We propose a novel feature learning framework named MAS-SAM for marine animal segmentation. Our method enables to extract richer marine information from global contextual cues to fine-grained local details.
arXiv Detail & Related papers (2024-04-24T07:38:14Z)
WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide images [8.179859593451285]
We present WSI-SAM, enhancing Segment Anything Model (SAM) with precise object segmentation capabilities for histopathology images. To fully exploit pretrained knowledge while minimizing training overhead, we keep SAM frozen, introducing only minimal extra parameters. Our model outperforms SAM by 4.1 and 2.5 percent points on a ductal carcinoma in situ (DCIS) segmentation tasks and breast cancer metastasis segmentation task.
arXiv Detail & Related papers (2024-03-14T10:30:43Z)
PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation [19.65118388712439]
We introduce a novel prompt-driven adapter into SAM, namely Prompt Adapter Segment Anything Model (PA-SAM) By exclusively training the prompt adapter, PA-SAM extracts detailed information from images and optimize the mask decoder feature at both sparse and dense prompt levels. Experimental results demonstrate that our PA-SAM outperforms other SAM-based methods in high-quality, zero-shot, and open-set segmentation.
arXiv Detail & Related papers (2024-01-23T19:20:22Z)
TinySAM: Pushing the Envelope for Efficient Segment Anything Model [76.21007576954035]
We propose a framework to obtain a tiny segment anything model (TinySAM) while maintaining the strong zero-shot performance. We first propose a full-stage knowledge distillation method with hard prompt sampling and hard mask weighting strategy to distill a lightweight student model. We also adapt the post-training quantization to the promptable segmentation task and further reduce the computational cost.
arXiv Detail & Related papers (2023-12-21T12:26:11Z)
Semantic-SAM: Segment and Recognize Anything at Any Granularity [83.64686655044765]
We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. We consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts. For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels.
arXiv Detail & Related papers (2023-07-10T17:59:40Z)
Personalize Segment Anything Model with One Shot [52.54453744941516]
We propose a training-free Personalization approach for Segment Anything Model (SAM) Given only a single image with a reference mask, PerSAM first localizes the target concept by a location prior. PerSAM segments it within other images or videos via three techniques: target-guided attention, target-semantic prompting, and cascaded post-refinement.
arXiv Detail & Related papers (2023-05-04T17:59:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.