EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment
Anything
- URL: http://arxiv.org/abs/2312.00863v1
- Date: Fri, 1 Dec 2023 18:31:00 GMT
- Title: EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment
Anything
- Authors: Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao,
Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman
Krishnamoorthi, Vikas Chandra
- Abstract summary: Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications.
We propose EfficientSAMs, light-weight SAM models that exhibits decent performance with largely reduced complexity.
Our idea is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning.
- Score: 36.553867358541154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segment Anything Model (SAM) has emerged as a powerful tool for numerous
vision applications. A key component that drives the impressive performance for
zero-shot transfer and high versatility is a super large Transformer model
trained on the extensive high-quality SA-1B dataset. While beneficial, the huge
computation cost of SAM model has limited its applications to wider real-world
applications. To address this limitation, we propose EfficientSAMs,
light-weight SAM models that exhibits decent performance with largely reduced
complexity. Our idea is based on leveraging masked image pretraining, SAMI,
which learns to reconstruct features from SAM image encoder for effective
visual representation learning. Further, we take SAMI-pretrained light-weight
image encoders and mask decoder to build EfficientSAMs, and finetune the models
on SA-1B for segment anything task. We perform evaluations on multiple vision
tasks including image classification, object detection, instance segmentation,
and semantic object detection, and find that our proposed pretraining method,
SAMI, consistently outperforms other masked image pretraining methods. On
segment anything task such as zero-shot instance segmentation, our
EfficientSAMs with SAMI-pretrained lightweight image encoders perform favorably
with a significant gain (e.g., ~4 AP on COCO/LVIS) over other fast SAM models.
Related papers
- Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments.
We propose UOIS-SAM, a data-efficient solution for the UOIS task.
UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z) - Tuning a SAM-Based Model with Multi-Cognitive Visual Adapter to Remote Sensing Instance Segmentation [4.6570959687411975]
The Segment Anything Model (SAM) demonstrates exceptional generalization capabilities.
SAM's lack of pretraining on massive remote sensing images and its interactive structure limit its automatic mask prediction capabilities.
A Multi- cognitive SAM-Based Instance Model (MC-SAM SEG) is introduced to employ SAM on remote sensing domain.
The proposed method named MC-SAM SEG extracts high-quality features by fine-tuning the SAM-Mona encoder along with a feature aggregator.
arXiv Detail & Related papers (2024-08-16T07:23:22Z) - Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection [58.241593208031816]
Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities.
We propose a Multi-scale and Detail-enhanced SAM (MDSAM) for Salient Object Detection (SOD)
Experimental results demonstrate the superior performance of our model on multiple SOD datasets.
arXiv Detail & Related papers (2024-08-08T09:09:37Z) - RobustSAM: Segment Anything Robustly on Degraded Images [19.767828436963317]
Segment Anything Model (SAM) has emerged as a transformative approach in image segmentation.
We propose the Robust Segment Anything Model (RobustSAM), which enhances SAM's performance on low-quality images.
Our method has been shown to effectively improve the performance of SAM-based downstream tasks such as single image dehazing and deblurring.
arXiv Detail & Related papers (2024-06-13T23:33:59Z) - MAS-SAM: Segment Any Marine Animal with Aggregated Features [55.91291540810978]
We propose a novel feature learning framework named MAS-SAM for marine animal segmentation.
Our method enables to extract richer marine information from global contextual cues to fine-grained local details.
arXiv Detail & Related papers (2024-04-24T07:38:14Z) - WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide images [8.179859593451285]
We present WSI-SAM, enhancing Segment Anything Model (SAM) with precise object segmentation capabilities for histopathology images.
To fully exploit pretrained knowledge while minimizing training overhead, we keep SAM frozen, introducing only minimal extra parameters.
Our model outperforms SAM by 4.1 and 2.5 percent points on a ductal carcinoma in situ (DCIS) segmentation tasks and breast cancer metastasis segmentation task.
arXiv Detail & Related papers (2024-03-14T10:30:43Z) - TinySAM: Pushing the Envelope for Efficient Segment Anything Model [76.21007576954035]
We propose a framework to obtain a tiny segment anything model (TinySAM) while maintaining the strong zero-shot performance.
We first propose a full-stage knowledge distillation method with hard prompt sampling and hard mask weighting strategy to distill a lightweight student model.
We also adapt the post-training quantization to the promptable segmentation task and further reduce the computational cost.
arXiv Detail & Related papers (2023-12-21T12:26:11Z) - Semantic-SAM: Segment and Recognize Anything at Any Granularity [83.64686655044765]
We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.
We consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts.
For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels.
arXiv Detail & Related papers (2023-07-10T17:59:40Z) - Personalize Segment Anything Model with One Shot [52.54453744941516]
We propose a training-free Personalization approach for Segment Anything Model (SAM)
Given only a single image with a reference mask, PerSAM first localizes the target concept by a location prior.
PerSAM segments it within other images or videos via three techniques: target-guided attention, target-semantic prompting, and cascaded post-refinement.
arXiv Detail & Related papers (2023-05-04T17:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.