MirrorMamba: Towards Scalable and Robust Mirror Detection in Videos
- URL: http://arxiv.org/abs/2511.06716v1
- Date: Mon, 10 Nov 2025 05:18:14 GMT
- Title: MirrorMamba: Towards Scalable and Robust Mirror Detection in Videos
- Authors: Rui Song, Jiaying Lin, Rynson W. H. Lau,
- Abstract summary: We propose a new effective and scalable video mirror detection method, called MirrorMamba.<n>Our approach leverages multiple cues to adapt to diverse conditions, incorporating perceived depth, correspondence and optical.<n> Notably, this work marks the first successful application of the Mamba-based architecture in the field of mirror detection.
- Score: 64.87702843502889
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video mirror detection has received significant research attention, yet existing methods suffer from limited performance and robustness. These approaches often over-rely on single, unreliable dynamic features, and are typically built on CNNs with limited receptive fields or Transformers with quadratic computational complexity. To address these limitations, we propose a new effective and scalable video mirror detection method, called MirrorMamba. Our approach leverages multiple cues to adapt to diverse conditions, incorporating perceived depth, correspondence and optical. We also introduce an innovative Mamba-based Multidirection Correspondence Extractor, which benefits from the global receptive field and linear complexity of the emerging Mamba spatial state model to effectively capture correspondence properties. Additionally, we design a Mamba-based layer-wise boundary enforcement decoder to resolve the unclear boundary caused by the blurred depth map. Notably, this work marks the first successful application of the Mamba-based architecture in the field of mirror detection. Extensive experiments demonstrate that our method outperforms existing state-of-the-art approaches for video mirror detection on the benchmark datasets. Furthermore, on the most challenging and representative image-based mirror detection dataset, our approach achieves state-of-the-art performance, proving its robustness and generalizability.
Related papers
- MIORe & VAR-MIORe: Benchmarks to Push the Boundaries of Restoration [53.180212987726556]
We introduce MIORe and VAR-MIORe, two novel multi-task datasets that address critical limitations in current motion restoration benchmarks.<n>Our datasets capture a broad spectrum of motion scenarios, which include complex ego-camera movements, dynamic multi-subject interactions, and depth-dependent blur effects.
arXiv Detail & Related papers (2025-09-08T15:34:31Z) - Trajectory-aware Shifted State Space Models for Online Video Super-Resolution [57.87099307245989]
This paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba)<n>TS-Mamba first constructs the trajectories within a video to select the most similar tokens from the previous frames.<n>Our TS-Mamba achieves state-of-the-art performance in most cases and over 22.7% reduction complexity (in MACs)
arXiv Detail & Related papers (2025-08-14T08:42:15Z) - Lightweight Multi-Frame Integration for Robust YOLO Object Detection in Videos [11.532574301455854]
We propose a highly effective strategy for multi-frame video object detection.<n>Our method improves robustness, especially for lightweight models.<n>We contribute the BOAT360 benchmark dataset to support future research in multi-frame video object detection in challenging real-world scenarios.
arXiv Detail & Related papers (2025-06-25T15:49:07Z) - Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook [46.65330450810048]
State Space Models (SSMs) have emerged as a paradigm-shifting solution, combining linear computational scaling with global context modeling.<n>This survey presents a comprehensive review of Mamba-based methodologies in remote sensing, systematically analyzing about 120 Mamba-based remote sensing studies.<n>Our contributions are structured across five dimensions: (i) foundational principles of vision Mamba architectures, (ii) micro-architectural advancements such as adaptive scan strategies and hybrid SSM formulations, (iii) macro-architectural integrations, including CNN-Transformer-Mamba hybrids and frequency-domain adaptations, (iv) rigorous benchmarking against state-of
arXiv Detail & Related papers (2025-05-01T16:07:51Z) - An Efficient and Mixed Heterogeneous Model for Image Restoration [71.85124734060665]
Current mainstream approaches are based on three architectural paradigms: CNNs, Transformers, and Mambas.<n>We propose RestorMixer, an efficient and general-purpose IR model based on mixed-architecture fusion.
arXiv Detail & Related papers (2025-04-15T08:19:12Z) - VADMamba: Exploring State Space Models for Fast Video Anomaly Detection [4.874215132369157]
VQ-Mamba Unet (VQ-MaU) framework incorporates a Vector Quantization (VQ) layer and Mamba-based Non-negative Visual State Space (NVSS) block.<n>Results validate the efficacy of the proposed VADMamba across three benchmark datasets.
arXiv Detail & Related papers (2025-03-27T05:38:12Z) - A Survey on Mamba Architecture for Vision Applications [7.216568558372857]
Mamba architecture addresses scalability challenges in visual tasks.<n>Vision Mamba and VideoMamba introduce bidirectional scanning, selective mechanisms, andtemporal processing to enhance image and video understanding.<n>These advancements position Mamba as a promising architecture in computer vision research and applications.
arXiv Detail & Related papers (2025-02-11T00:59:30Z) - Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks [49.84182981950623]
Multispectral object detection, utilizing RGB and TIR (thermal infrared) modalities, is widely recognized as a challenging task.<n>It requires not only the effective extraction of features from both modalities and robust fusion strategies, but also the ability to address issues such as spectral discrepancies.<n>We introduce an efficient and easily deployable multispectral object detection framework that can seamlessly optimize high-performing single-modality models.
arXiv Detail & Related papers (2024-11-27T12:18:39Z) - MLLA-UNet: Mamba-like Linear Attention in an Efficient U-Shape Model for Medical Image Segmentation [6.578088710294546]
Traditional segmentation methods struggle to address challenges such as high anatomical variability, blurred tissue boundaries, low organ contrast, and noise.
We propose MLLA-UNet (Mamba-Like Linear Attention UNet), a novel architecture that achieves linear computational complexity while maintaining high segmentation accuracy.
Experiments demonstrate that MLLA-UNet achieves state-of-the-art performance on six challenging datasets with 24 different segmentation tasks, including but not limited to FLARE22, AMOS CT, and ACDC, with an average DSC of 88.32%.
arXiv Detail & Related papers (2024-10-31T08:54:23Z) - Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition [86.31412529187243]
Few-shot video recognition aims at learning new actions with only very few labeled samples.
We propose a depth guided Adaptive Meta-Fusion Network for few-shot video recognition which is termed as AMeFu-Net.
arXiv Detail & Related papers (2020-10-20T03:06:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.