Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
- URL: http://arxiv.org/abs/2401.14159v1
- Date: Thu, 25 Jan 2024 13:12:09 GMT
- Title: Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
- Authors: Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao,
Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang,
Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang
- Abstract summary: We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM)
This integration enables the detection and segmentation of any regions based on arbitrary text inputs.
- Score: 47.646824158039664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Grounded SAM, which uses Grounding DINO as an open-set object
detector to combine with the segment anything model (SAM). This integration
enables the detection and segmentation of any regions based on arbitrary text
inputs and opens a door to connecting various vision models. As shown in Fig.1,
a wide range of vision tasks can be achieved by using the versatile Grounded
SAM pipeline. For example, an automatic annotation pipeline based solely on
input images can be realized by incorporating models such as BLIP and Recognize
Anything. Additionally, incorporating Stable-Diffusion allows for controllable
image editing, while the integration of OSX facilitates promptable 3D human
motion analysis. Grounded SAM also shows superior performance on
open-vocabulary benchmarks, achieving 48.7 mean AP on SegInW (Segmentation in
the wild) zero-shot benchmark with the combination of Grounding DINO-Base and
SAM-Huge models.
Related papers
- Prompting DirectSAM for Semantic Contour Extraction in Remote Sensing Images [11.845626002236772]
We introduce a foundation model derived from DirectSAM, termed DirectSAM-RS, which inherits the strong segmentation capability acquired from natural images.
This dataset comprises over 34k image-text-contour triplets, making it at least 30 times larger than individual dataset.
We evaluate the DirectSAM-RS in both zero-shot and fine-tuning setting, and demonstrate that it achieves state-of-the-art performance across several downstream benchmarks.
arXiv Detail & Related papers (2024-10-08T16:55:42Z) - Tuning a SAM-Based Model with Multi-Cognitive Visual Adapter to Remote Sensing Instance Segmentation [4.6570959687411975]
The Segment Anything Model (SAM) demonstrates exceptional generalization capabilities.
SAM's lack of pretraining on massive remote sensing images and its interactive structure limit its automatic mask prediction capabilities.
A Multi- cognitive SAM-Based Instance Model (MC-SAM SEG) is introduced to employ SAM on remote sensing domain.
The proposed method named MC-SAM SEG extracts high-quality features by fine-tuning the SAM-Mona encoder along with a feature aggregator.
arXiv Detail & Related papers (2024-08-16T07:23:22Z) - Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection [58.241593208031816]
Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities.
We propose a Multi-scale and Detail-enhanced SAM (MDSAM) for Salient Object Detection (SOD)
Experimental results demonstrate the superior performance of our model on multiple SOD datasets.
arXiv Detail & Related papers (2024-08-08T09:09:37Z) - AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning [61.666973416903005]
Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts.
We propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context.
arXiv Detail & Related papers (2024-06-01T16:21:39Z) - MAS-SAM: Segment Any Marine Animal with Aggregated Features [55.91291540810978]
We propose a novel feature learning framework named MAS-SAM for marine animal segmentation.
Our method enables to extract richer marine information from global contextual cues to fine-grained local details.
arXiv Detail & Related papers (2024-04-24T07:38:14Z) - RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for
Remote Sensing Image Semantic Segmentation [10.37240769959699]
Segment Anything Model (SAM) provides a universal pre-training model for image segmentation tasks.
We propose RSAM-Seg, which stands for Remote Sensing SAM with Semantic, as a tailored modification of SAM for the remote sensing field.
Adapter-Scale, a set of supplementary scaling modules, are proposed in the multi-head attention blocks of the encoder part of SAM.
Experiments are conducted on four distinct remote sensing scenarios, encompassing cloud detection, field monitoring, building detection and road mapping tasks.
arXiv Detail & Related papers (2024-02-29T09:55:46Z) - Boosting Segment Anything Model Towards Open-Vocabulary Learning [69.42565443181017]
Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model.
Despite SAM finding applications and adaptations in various domains, its primary limitation lies in the inability to grasp object semantics.
We present Sambor to seamlessly integrate SAM with the open-vocabulary object detector in an end-to-end framework.
arXiv Detail & Related papers (2023-12-06T17:19:00Z) - Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM) [8.529233820032678]
The Segment Anything Model (SAM) is the first foundation model for image segmentation.
In this study, we evaluate SAM's ability to segment features from eye images recorded in virtual reality setups.
Our investigation centers on SAM's zero-shot learning abilities and the effectiveness of prompts like bounding boxes or point clicks.
arXiv Detail & Related papers (2023-11-14T11:05:08Z) - The Segment Anything Model (SAM) for Remote Sensing Applications: From
Zero to One Shot [6.500451285898152]
This study aims to advance the application of the Segment Anything Model (SAM) in remote sensing image analysis.
SAM is known for its exceptional generalization capabilities and zero-shot learning.
Despite the limitations encountered with lower spatial resolution images, SAM exhibits promising adaptability to remote sensing data analysis.
arXiv Detail & Related papers (2023-06-29T01:49:33Z) - Segment and Track Anything [57.20918630166862]
This report presents a framework called Segment And Track Anything (SAMTrack)
SAM-Track allows users to precisely and effectively segment and track any object in a video.
It can be used across an array of fields, ranging from drone technology, autonomous driving, medical imaging, augmented reality, to biological analysis.
arXiv Detail & Related papers (2023-05-11T04:33:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.