MirrorSAM2: Segment Mirror in Videos with Depth Perception
- URL: http://arxiv.org/abs/2509.17220v1
- Date: Sun, 21 Sep 2025 20:00:33 GMT
- Title: MirrorSAM2: Segment Mirror in Videos with Depth Perception
- Authors: Mingchen Xu, Yukun Lai, Ze Ji, Jing Wu,
- Abstract summary: This paper presents MirrorSAM2, the first framework that adapts Segment Anything Model 2 (SAM2) to the task of RGB-D video mirror segmentation.<n>MirrorSAM2 addresses key challenges in mirror detection, such as ambiguity reflection and texture confusion.<n>Experiments on the VMD and DVMD benchmark demonstrate that MirrorSAM2 achieves SOTA performance, even under challenging conditions such as small mirrors, weak boundaries, and strong reflections.
- Score: 48.40774412545921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents MirrorSAM2, the first framework that adapts Segment Anything Model 2 (SAM2) to the task of RGB-D video mirror segmentation. MirrorSAM2 addresses key challenges in mirror detection, such as reflection ambiguity and texture confusion, by introducing four tailored modules: a Depth Warping Module for RGB and depth alignment, a Depth-guided Multi-Scale Point Prompt Generator for automatic prompt generation, a Frequency Detail Attention Fusion Module to enhance structural boundaries, and a Mirror Mask Decoder with a learnable mirror token for refined segmentation. By fully leveraging the complementarity between RGB and depth, MirrorSAM2 extends SAM2's capabilities to the prompt-free setting. To our knowledge, this is the first work to enable SAM2 for automatic video mirror segmentation. Experiments on the VMD and DVMD benchmark demonstrate that MirrorSAM2 achieves SOTA performance, even under challenging conditions such as small mirrors, weak boundaries, and strong reflections.
Related papers
- Evaluating SAM2 for Video Semantic Segmentation [60.157605818225186]
The Anything Model 2 (SAM2) has proven to be a powerful foundation model for promptable visual object segmentation in both images and videos.<n>This paper explores the extension of SAM2 to dense Video Semantic (VSS)<n>Our experiments suggest that leveraging SAM2 enhances overall performance in VSS, primarily due to its precise predictions of object boundaries.
arXiv Detail & Related papers (2025-12-01T15:15:16Z) - MirrorMamba: Towards Scalable and Robust Mirror Detection in Videos [64.87702843502889]
We propose a new effective and scalable video mirror detection method, called MirrorMamba.<n>Our approach leverages multiple cues to adapt to diverse conditions, incorporating perceived depth, correspondence and optical.<n> Notably, this work marks the first successful application of the Mamba-based architecture in the field of mirror detection.
arXiv Detail & Related papers (2025-11-10T05:18:14Z) - SAM2-UNeXT: An Improved High-Resolution Baseline for Adapting Foundation Models to Downstream Segmentation Tasks [50.97089872043121]
We propose SAM2-UNeXT, an advanced framework that builds upon the core principles of SAM2-UNet.<n>We extend the representational capacity of SAM2 through the integration of an auxiliary DINOv2 encoder.<n>Our approach enables more accurate segmentation with a simple architecture, relaxing the need for complex decoder designs.
arXiv Detail & Related papers (2025-08-05T15:36:13Z) - DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency [91.30252180093333]
We propose the Dual Consistency SAM (DCSAM) method based on prompttuning to adapt SAM and SAM2 for in-context segmentation.<n>Our key insights are to enhance the features of the SAM's prompt encoder in segmentation by providing high-quality visual prompts.<n>Although the proposed DC-SAM is primarily designed for images, it can be seamlessly extended to the video domain with the support SAM2.
arXiv Detail & Related papers (2025-04-16T13:41:59Z) - MGD-SAM2: Multi-view Guided Detail-enhanced Segment Anything Model 2 for High-Resolution Class-agnostic Segmentation [6.976534642198541]
We propose MGD-SAM2, which integrates SAM2 with multi-view feature interaction between a global image and local patches to achieve precise segmentation.<n>We first introduce MPAdapter to adapt the SAM2 encoder for enhanced extraction of local details and global semantics in HRCS images.<n>Then, MCEM and HMIM are proposed to further exploit local texture and global context by aggregating multi-view features within and across multi-scales.
arXiv Detail & Related papers (2025-03-31T07:02:32Z) - CamSAM2: Segment Anything Accurately in Camouflaged Videos [37.0152845263844]
We propose Camouflaged SAM2 (CamSAM2) to handle camouflaged scenes without modifying SAM2's parameters.<n>To make full use of fine-grained and high-resolution features from the current frame and previous frames, we propose implicit object-aware fusion (IOF) and explicit object-aware fusion (EOF) modules.<n>While CamSAM2 only adds negligible learnable parameters to SAM2, it substantially outperforms SAM2 on three VCOS datasets.
arXiv Detail & Related papers (2025-03-25T14:58:52Z) - SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation [51.90445260276897]
We prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models.
We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation.
arXiv Detail & Related papers (2024-08-16T17:55:38Z) - Symmetry-Aware Transformer-based Mirror Detection [85.47570468668955]
We propose a dual-path Symmetry-Aware Transformer-based mirror detection Network (SATNet)
SATNet includes two novel modules: Symmetry-Aware Attention Module (SAAM) and Contrast and Fusion Decoder Module (CFDM)
Experimental results show that SATNet outperforms both RGB and RGB-D mirror detection methods on all available mirror detection datasets.
arXiv Detail & Related papers (2022-07-13T16:40:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.