Boosting Segment Anything Model Towards Open-Vocabulary Learning
- URL: http://arxiv.org/abs/2312.03628v1
- Date: Wed, 6 Dec 2023 17:19:00 GMT
- Title: Boosting Segment Anything Model Towards Open-Vocabulary Learning
- Authors: Xumeng Han, Longhui Wei, Xuehui Yu, Zhiyang Dou, Xin He, Kuiran Wang,
Zhenjun Han, Qi Tian
- Abstract summary: Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model.
Despite SAM finding applications and adaptations in various domains, its primary limitation lies in the inability to grasp object semantics.
We present Sambor to seamlessly integrate SAM with the open-vocabulary object detector in an end-to-end framework.
- Score: 69.42565443181017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent Segment Anything Model (SAM) has emerged as a new paradigmatic
vision foundation model, showcasing potent zero-shot generalization and
flexible prompting. Despite SAM finding applications and adaptations in various
domains, its primary limitation lies in the inability to grasp object
semantics. In this paper, we present Sambor to seamlessly integrate SAM with
the open-vocabulary object detector in an end-to-end framework. While retaining
all the remarkable capabilities inherent to SAM, we enhance it with the
capacity to detect arbitrary objects based on human inputs like category names
or reference expressions. To accomplish this, we introduce a novel SideFormer
module that extracts SAM features to facilitate zero-shot object localization
and inject comprehensive semantic information for open-vocabulary recognition.
In addition, we devise an open-set region proposal network (Open-set RPN),
enabling the detector to acquire the open-set proposals generated by SAM.
Sambor demonstrates superior zero-shot performance across benchmarks, including
COCO and LVIS, proving highly competitive against previous SoTA methods. We
aspire for this work to serve as a meaningful endeavor in endowing SAM to
recognize diverse object categories and advancing open-vocabulary learning with
the support of vision foundation models.
Related papers
- Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts [14.631774737903015]
Existing perception models achieve great success by learning from large amounts of labeled data, but they still struggle with open-world scenarios.
We present textiti.e., open-ended object detection, which discovers unseen objects without any object categories as inputs.
We show that our method surpasses the previous open-ended method on the object detection task and can provide additional instance segmentation masks.
arXiv Detail & Related papers (2024-10-08T12:15:08Z) - SAM-SP: Self-Prompting Makes SAM Great Again [11.109389094334894]
Segment Anything Model (SAM) has demonstrated impressive capabilities in zero-shot segmentation tasks.
SAM encounters noticeably degradation performance when applied to specific domains, such as medical images.
We introduce a novel self-prompting based fine-tuning approach, called SAM-SP, tailored for extending the vanilla SAM model.
arXiv Detail & Related papers (2024-08-22T13:03:05Z) - AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning [61.666973416903005]
Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts.
We propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context.
arXiv Detail & Related papers (2024-06-01T16:21:39Z) - ASAM: Boosting Segment Anything Model with Adversarial Tuning [9.566046692165884]
This paper introduces ASAM, a novel methodology that amplifies a foundation model's performance through adversarial tuning.
We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing.
Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations.
arXiv Detail & Related papers (2024-05-01T00:13:05Z) - PosSAM: Panoptic Open-vocabulary Segment Anything [58.72494640363136]
PosSAM is an open-vocabulary panoptic segmentation model that unifies the strengths of the Segment Anything Model (SAM) with the vision-native CLIP model in an end-to-end framework.
We introduce a Mask-Aware Selective Ensembling (MASE) algorithm that adaptively enhances the quality of generated masks and boosts the performance of open-vocabulary classification during inference for each image.
arXiv Detail & Related papers (2024-03-14T17:55:03Z) - VRP-SAM: SAM with Visual Reference Prompt [73.05676082695459]
We propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Anything Model (SAM) to utilize annotated reference images as prompts for segmentation.
In essence, VRP-SAM can utilize annotated reference images to comprehend specific objects and perform segmentation of specific objects in target image.
arXiv Detail & Related papers (2024-02-27T17:58:09Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation
based on Visual Foundation Model [29.42043345787285]
We propose a method to learn the generation of appropriate prompts for Segment Anything Model (SAM)
This enables SAM to produce semantically discernible segmentation results for remote sensing images.
We also propose several ongoing derivatives for instance segmentation tasks, drawing on recent advancements within the SAM community, and compare their performance with RSPrompter.
arXiv Detail & Related papers (2023-06-28T14:51:34Z) - A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering [49.732628643634975]
The Segment Anything Model (SAM), developed by Meta AI Research, offers a robust framework for image and video segmentation.
This survey provides a comprehensive exploration of the SAM family, including SAM and SAM 2, highlighting their advancements in granularity and contextual understanding.
arXiv Detail & Related papers (2023-05-12T07:21:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.