A Comprehensive Survey on Segment Anything Model for Vision and Beyond
- URL: http://arxiv.org/abs/2305.08196v2
- Date: Fri, 19 May 2023 16:33:03 GMT
- Title: A Comprehensive Survey on Segment Anything Model for Vision and Beyond
- Authors: Chunhui Zhang, Li Liu, Yawen Cui, Guanjie Huang, Weilin Lin, Yiqian
Yang, Yuehong Hu
- Abstract summary: It is urgent to design a general class of models, which we term foundation models, trained on broad data.
The recently proposed segment anything model (SAM) has made significant progress in breaking the boundaries of segmentation.
This paper introduces the background and terminology for foundation models including SAM, as well as state-of-the-art methods contemporaneous with SAM.
- Score: 7.920790211915402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial intelligence (AI) is evolving towards artificial general
intelligence, which refers to the ability of an AI system to perform a wide
range of tasks and exhibit a level of intelligence similar to that of a human
being. This is in contrast to narrow or specialized AI, which is designed to
perform specific tasks with a high degree of efficiency. Therefore, it is
urgent to design a general class of models, which we term foundation models,
trained on broad data that can be adapted to various downstream tasks. The
recently proposed segment anything model (SAM) has made significant progress in
breaking the boundaries of segmentation, greatly promoting the development of
foundation models for computer vision. To fully comprehend SAM, we conduct a
survey study. As the first to comprehensively review the progress of segmenting
anything task for vision and beyond based on the foundation model of SAM, this
work focuses on its applications to various tasks and data types by discussing
its historical development, recent progress, and profound impact on broad
applications. We first introduce the background and terminology for foundation
models including SAM, as well as state-of-the-art methods contemporaneous with
SAM that are significant for segmenting anything task. Then, we analyze and
summarize the advantages and limitations of SAM across various image processing
applications, including software scenes, real-world scenes, and complex scenes.
Importantly, many insights are drawn to guide future research to develop more
versatile foundation models and improve the architecture of SAM. We also
summarize massive other amazing applications of SAM in vision and beyond.
Finally, we maintain a continuously updated paper list and an open-source
project summary for foundation model SAM at
\href{https://github.com/liliu-avril/Awesome-Segment-Anything}{\color{magenta}{here}}.
Related papers
- AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning [61.666973416903005]
Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts.
We propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context.
arXiv Detail & Related papers (2024-06-01T16:21:39Z) - ASAM: Boosting Segment Anything Model with Adversarial Tuning [9.566046692165884]
This paper introduces ASAM, a novel methodology that amplifies a foundation model's performance through adversarial tuning.
We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing.
Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations.
arXiv Detail & Related papers (2024-05-01T00:13:05Z) - MAS-SAM: Segment Any Marine Animal with Aggregated Features [55.91291540810978]
We propose a novel feature learning framework named MAS-SAM for marine animal segmentation.
Our method enables to extract richer marine information from global contextual cues to fine-grained local details.
arXiv Detail & Related papers (2024-04-24T07:38:14Z) - The Revolution of Multimodal Large Language Models: A Survey [46.84953515670248]
Multimodal Large Language Models (MLLMs) can seamlessly integrate visual and textual modalities.
This paper provides a review of recent visual-based MLLMs, analyzing their architectural choices, multimodal alignment strategies, and training techniques.
arXiv Detail & Related papers (2024-02-19T19:01:01Z) - Generalizable Visual Reinforcement Learning with Segment Anything Model [28.172477166023697]
We introduce Segment Anything Model for Generalizable visual RL (SAM-G)
SAM-G is a novel framework that leverages the promptable segmentation ability of Segment Anything Model (SAM) to enhance the generalization capabilities of visual RL agents.
evaluated across 8 DMControl tasks and 3 Adroit tasks, SAM-G significantly improves the visual generalization ability without altering the RL agents' architecture but merely their observations.
arXiv Detail & Related papers (2023-12-28T16:53:23Z) - General Object Foundation Model for Images and Videos at Scale [99.2806103051613]
We present GLEE, an object-level foundation model for locating and identifying objects in images and videos.
GLEE accomplishes detection, segmentation, tracking, grounding, and identification of arbitrary objects in the open world scenario.
We employ an image encoder, text encoder, and visual prompter to handle multi-modal inputs, enabling to simultaneously solve various object-centric downstream tasks.
arXiv Detail & Related papers (2023-12-14T17:26:00Z) - Boosting Segment Anything Model Towards Open-Vocabulary Learning [69.42565443181017]
Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model.
Despite SAM finding applications and adaptations in various domains, its primary limitation lies in the inability to grasp object semantics.
We present Sambor to seamlessly integrate SAM with the open-vocabulary object detector in an end-to-end framework.
arXiv Detail & Related papers (2023-12-06T17:19:00Z) - A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets
Prompt Engineering [37.68799208121957]
Segment anything model (SAM) developed by Meta AI Research has attracted significant attention.
With the relevant papers and projects increasing exponentially, it is challenging for the readers to catch up with the development of SAM.
This work conducts the first yet comprehensive survey on SAM.
arXiv Detail & Related papers (2023-05-12T07:21:59Z) - SAM Fails to Segment Anything? -- SAM-Adapter: Adapting SAM in
Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and
More [13.047310918166762]
We propose textbfSAM-Adapter, which incorporates domain-specific information or visual prompts into the segmentation network by using simple yet effective adapters.
We can even outperform task-specific network models and achieve state-of-the-art performance in the task we tested: camouflaged object detection.
arXiv Detail & Related papers (2023-04-18T17:38:54Z) - Segment Anything Is Not Always Perfect: An Investigation of SAM on
Different Real-world Applications [31.31905890353516]
Recently, Meta AI Research approaches a general, promptable Segment Anything Model (SAM) pre-trained on an unprecedentedly large segmentation dataset (SA-1B)
We conduct a series of intriguing investigations into the performance of SAM across various applications, particularly in the fields of natural images, agriculture, manufacturing, remote sensing, and healthcare.
arXiv Detail & Related papers (2023-04-12T10:10:03Z) - Dynamic Feature Integration for Simultaneous Detection of Salient
Object, Edge and Skeleton [108.01007935498104]
In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction.
We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework.
arXiv Detail & Related papers (2020-04-18T11:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.