Related papers: Segment Anything for Videos: A Systematic Survey

Segment Anything for Videos: A Systematic Survey

URL: http://arxiv.org/abs/2408.08315v1
Date: Wed, 31 Jul 2024 02:24:53 GMT
Title: Segment Anything for Videos: A Systematic Survey
Authors: Chunhui Zhang, Yawen Cui, Weilin Lin, Guanjie Huang, Yan Rong, Li Liu, Shiguang Shan,
Abstract summary: The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond. The segment anything model (SAM) has sparked a passion for exploring task-agnostic visual foundation models. This work conducts a systematic review on SAM for videos in the era of foundation models.
Score: 52.28931543292431
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond, with the segment anything model (SAM) having sparked a passion for exploring task-agnostic visual foundation models. Empowered by its remarkable zero-shot generalization, SAM is currently challenging numerous traditional paradigms in CV, delivering extraordinary performance not only in various image segmentation and multi-modal segmentation (\eg, text-to-mask) tasks, but also in the video domain. Additionally, the latest released SAM 2 is once again sparking research enthusiasm in the realm of promptable visual segmentation for both images and videos. However, existing surveys mainly focus on SAM in various image processing tasks, a comprehensive and in-depth review in the video domain is notably absent. To address this gap, this work conducts a systematic review on SAM for videos in the era of foundation models. As the first to review the progress of SAM for videos, this work focuses on its applications to various tasks by discussing its recent advances, and innovation opportunities of developing foundation models on broad applications. We begin with a brief introduction to the background of SAM and video-related research domains. Subsequently, we present a systematic taxonomy that categorizes existing methods into three key areas: video understanding, video generation, and video editing, analyzing and summarizing their advantages and limitations. Furthermore, comparative results of SAM-based and current state-of-the-art methods on representative benchmarks, as well as insightful analysis are offered. Finally, we discuss the challenges faced by current research and envision several future research directions in the field of SAM for video and beyond.

Related papers

SAM2 for Image and Video Segmentation: A Comprehensive Survey [0.0]
Image and video segmentation are fundamental tasks in computer vision with wide-ranging applications in healthcare, agriculture, industrial inspection, and autonomous driving. With the advent of large-scale foundation models, SAM2 has been optimized for segmentation tasks, demonstrating enhanced performance in complex scenarios. This paper systematically analyzes the application of SAM2 in image and video segmentation and evaluates its performance in various fields.
arXiv Detail & Related papers (2025-03-17T03:33:36Z)
Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes [63.966251473172036]
The foundational model SAM has influenced multiple fields within computer vision, and its upgraded version, SAM 2, enhances capabilities in video segmentation. While SAMs have demonstrated excellent performance in segmenting context-independent concepts like people, cars, and roads, they overlook more challenging context-dependent (CD) concepts, such as visual saliency, camouflage, product defects, and medical lesions. We conduct a thorough quantitative evaluation of SAMs on 11 CD concepts across 2D and 3D images and videos in various visual modalities within natural, medical, and industrial scenes.
arXiv Detail & Related papers (2024-12-02T08:03:56Z)
Image Segmentation in Foundation Model Era: A Survey [99.19456390358211]
Current research in image segmentation lacks a detailed analysis of distinct characteristics, challenges, and solutions associated with these advancements. This survey seeks to fill this gap by providing a thorough review of cutting-edge research centered around FM-driven image segmentation. An exhaustive overview of over 300 segmentation approaches is provided to encapsulate the breadth of current research efforts.
arXiv Detail & Related papers (2024-08-23T10:07:59Z)
Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey [8.216028136706948]
Segment Anything Model (SAM) signifies a noteworthy expansion of the prompt-driven paradigm into the domain of image segmentation. Recent introduction of SAM2 effectively extends the original SAM to a streaming fashion and demonstrates strong performance in video segmentation. This paper presents an overview of recent efforts in applying and adapting SAM2 to biomedical images and videos.
arXiv Detail & Related papers (2024-08-23T07:51:10Z)
Boosting Segment Anything Model Towards Open-Vocabulary Learning [69.24734826209367]
Segment Anything Model (SAM) has emerged as a new paradigmatic vision foundation model. Despite SAM finding applications and adaptations in various domains, its primary limitation lies in the inability to grasp object semantics. We present Sambor to seamlessly integrate SAM with the open-vocabulary object detector in an end-to-end framework.
arXiv Detail & Related papers (2023-12-06T17:19:00Z)
A Survey on Video Diffusion Models [103.03565844371711]
The recent wave of AI-generated content (AIGC) has witnessed substantial success in computer vision. Due to their impressive generative capabilities, diffusion models are gradually superseding methods based on GANs and auto-regressive Transformers. This paper presents a comprehensive review of video diffusion models in the AIGC era.
arXiv Detail & Related papers (2023-10-16T17:59:28Z)
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal. We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z)
A Comprehensive Survey on Segment Anything Model for Vision and Beyond [7.920790211915402]
It is urgent to design a general class of models, which we term foundation models, trained on broad data. The recently proposed segment anything model (SAM) has made significant progress in breaking the boundaries of segmentation. This paper introduces the background and terminology for foundation models including SAM, as well as state-of-the-art methods contemporaneous with SAM.
arXiv Detail & Related papers (2023-05-14T16:23:22Z)
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering [49.732628643634975]
The Segment Anything Model (SAM), developed by Meta AI Research, offers a robust framework for image and video segmentation. This survey provides a comprehensive exploration of the SAM family, including SAM and SAM 2, highlighting their advancements in granularity and contextual understanding.
arXiv Detail & Related papers (2023-05-12T07:21:59Z)
Segment anything, from space? [8.126645790463266]
"Segment Anything Model" (SAM) can segment objects in input imagery based on cheap input prompts. SAM usually achieved recognition accuracy similar to, or sometimes exceeding, vision models that had been trained on the target tasks. We examine whether SAM's performance extends to overhead imagery problems and help guide the community's response to its development.
arXiv Detail & Related papers (2023-04-25T17:14:36Z)
Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications [31.31905890353516]
Recently, Meta AI Research approaches a general, promptable Segment Anything Model (SAM) pre-trained on an unprecedentedly large segmentation dataset (SA-1B) We conduct a series of intriguing investigations into the performance of SAM across various applications, particularly in the fields of natural images, agriculture, manufacturing, remote sensing, and healthcare.
arXiv Detail & Related papers (2023-04-12T10:10:03Z)
A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications. Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.