When SAM2 Meets Video Shadow and Mirror Detection
- URL: http://arxiv.org/abs/2412.19293v1
- Date: Thu, 26 Dec 2024 17:35:20 GMT
- Title: When SAM2 Meets Video Shadow and Mirror Detection
- Authors: Leiping Jie,
- Abstract summary: We evaluate the effectiveness of the Segment Anything Model 2 (SAM2) on three distinct video segmentation tasks.<n>Specifically, we use ground truth point or mask prompts to initialize the first frame and then predict corresponding masks for subsequent frames.<n> Experimental results show that SAM2's performance on these tasks is suboptimal, especially when point prompts are used.
- Score: 3.3993877661368757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the successor to the Segment Anything Model (SAM), the Segment Anything Model 2 (SAM2) not only improves performance in image segmentation but also extends its capabilities to video segmentation. However, its effectiveness in segmenting rare objects that seldom appear in videos remains underexplored. In this study, we evaluate SAM2 on three distinct video segmentation tasks: Video Shadow Detection (VSD) and Video Mirror Detection (VMD). Specifically, we use ground truth point or mask prompts to initialize the first frame and then predict corresponding masks for subsequent frames. Experimental results show that SAM2's performance on these tasks is suboptimal, especially when point prompts are used, both quantitatively and qualitatively. Code is available at \url{https://github.com/LeipingJie/SAM2Video}
Related papers
- DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency [91.30252180093333]
We propose the Dual Consistency SAM (DCSAM) method based on prompttuning to adapt SAM and SAM2 for in-context segmentation.
Our key insights are to enhance the features of the SAM's prompt encoder in segmentation by providing high-quality visual prompts.
Although the proposed DC-SAM is primarily designed for images, it can be seamlessly extended to the video domain with the support SAM2.
arXiv Detail & Related papers (2025-04-16T13:41:59Z) - CamSAM2: Segment Anything Accurately in Camouflaged Videos [37.0152845263844]
We propose Camouflaged SAM2 (CamSAM2) to handle camouflaged scenes without modifying SAM2's parameters.
To make full use of fine-grained and high-resolution features from the current frame and previous frames, we propose implicit object-aware fusion (IOF) and explicit object-aware fusion (EOF) modules.
While CamSAM2 only adds negligible learnable parameters to SAM2, it substantially outperforms SAM2 on three VCOS datasets.
arXiv Detail & Related papers (2025-03-25T14:58:52Z) - Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos [110.3379755761583]
Sa2VA is a unified model for grounded understanding of both images and videos.
It supports a wide range of image and video tasks, including referring segmentation and conversation.
We show that Sa2VA achieves state-of-the-art across multiple tasks, particularly in referring video object segmentation.
arXiv Detail & Related papers (2025-01-07T18:58:54Z) - SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree [79.26409013413003]
We introduce SAM2Long, an improved training-free video object segmentation strategy.<n>It considers the segmentation uncertainty within each frame and chooses the video-level optimal results from multiple segmentation pathways.<n> SAM2Long achieves an average improvement of 3.0 points across all 24 head-to-head comparisons.
arXiv Detail & Related papers (2024-10-21T17:59:19Z) - When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation [36.174458990817165]
This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS)
VCOS involves detecting objects that blend seamlessly in the surroundings for videos, due to similar colors and textures, poor light conditions, etc.
arXiv Detail & Related papers (2024-09-27T11:35:50Z) - SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation [51.90445260276897]
We prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models.
We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation.
arXiv Detail & Related papers (2024-08-16T17:55:38Z) - Moving Object Segmentation: All You Need Is SAM (and Flow) [82.78026782967959]
We investigate two models for combining SAM with optical flow that harness the segmentation power of SAM with the ability of flow to discover and group moving objects.
In the first model, we adapt SAM to take optical flow, rather than RGB, as an input. In the second, SAM takes RGB as an input, and flow is used as a segmentation prompt.
These surprisingly simple methods, without any further modifications, outperform all previous approaches by a considerable margin in both single and multi-object benchmarks.
arXiv Detail & Related papers (2024-04-18T17:59:53Z) - Propagating Semantic Labels in Video Data [0.0]
This work presents a method for performing segmentation for objects in video.
Once an object has been found in a frame of video, the segment can then be propagated to future frames.
The method works by combining SAM with Structure from Motion.
arXiv Detail & Related papers (2023-10-01T20:32:26Z) - Segment Anything Meets Point Tracking [116.44931239508578]
This paper presents a novel method for point-centric interactive video segmentation, empowered by SAM and long-term point tracking.
We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark.
Our experiments on popular video object segmentation and multi-object segmentation tracking benchmarks, including DAVIS, YouTube-VOS, and BDD100K, suggest that a point-based segmentation tracker yields better zero-shot performance and efficient interactions.
arXiv Detail & Related papers (2023-07-03T17:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.