Related papers: MVGD-Net: A Novel Motion-aware Video Glass Surface Detection Network

MVGD-Net: A Novel Motion-aware Video Glass Surface Detection Network

URL: http://arxiv.org/abs/2601.13715v1
Date: Tue, 20 Jan 2026 08:19:17 GMT
Title: MVGD-Net: A Novel Motion-aware Video Glass Surface Detection Network
Authors: Yiwei Lu, Hao Huang, Tao Yan,
Abstract summary: Glass surface ubiquitous in both daily life and professional environments presents a potential threat to vision-based systems.<n>We propose a novel network, named MVGD-Net, for detecting glass surfaces in videos by leveraging motion inconsistency cues.<n>For learning our network, we also propose a large-scale dataset, which comprises 312 diverse glass scenarios with a total of 19,268 frames.
Score: 7.190998786246486
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Glass surface ubiquitous in both daily life and professional environments presents a potential threat to vision-based systems, such as robot and drone navigation. To solve this challenge, most recent studies have shown significant interest in Video Glass Surface Detection (VGSD). We observe that objects in the reflection (or transmission) layer appear farther from the glass surfaces. Consequently, in video motion scenarios, the notable reflected (or transmitted) objects on the glass surface move slower than objects in non-glass regions within the same spatial plane, and this motion inconsistency can effectively reveal the presence of glass surfaces. Based on this observation, we propose a novel network, named MVGD-Net, for detecting glass surfaces in videos by leveraging motion inconsistency cues. Our MVGD-Net features three novel modules: the Cross-scale Multimodal Fusion Module (CMFM) that integrates extracted spatial features and estimated optical flow maps, the History Guided Attention Module (HGAM) and Temporal Cross Attention Module (TCAM), both of which further enhances temporal features. A Temporal-Spatial Decoder (TSD) is also introduced to fuse the spatial and temporal features for generating the glass region mask. Furthermore, for learning our network, we also propose a large-scale dataset, which comprises 312 diverse glass scenarios with a total of 19,268 frames. Extensive experiments demonstrate that our MVGD-Net outperforms relevant state-of-the-art methods.

Related papers

Glass Segmentation with Fusion of Learned and General Visual Features [2.3821941487858935]
Glass surface segmentation from RGB images is a challenging task, since glass as a transparent material distinctly lacks visual characteristics.<n>This paper presents a novel architecture for glass segmentation, deploying a dual-backbone producing general visual features as well as task-specific learned visual features.<n>The architecture was evaluated on four commonly used glass segmentation datasets, achieving state-of-the-art results on several accuracy metrics.
arXiv Detail & Related papers (2026-03-04T04:40:30Z)
Glass Surface Detection: Leveraging Reflection Dynamics in Flash/No-flash Imagery [82.6332672749888]
Glass surfaces are ubiquitous in daily life, typically appearing colorless, transparent, and lacking distinctive features.<n>We propose NFGlassNet, a novel method for glass surface detection that leverages the reflection dynamics present in flash/no-flash imagery.
arXiv Detail & Related papers (2025-11-21T02:00:17Z)
SeaDSC: A video-based unsupervised method for dynamic scene change detection in unmanned surface vehicles [3.2716252389196288]
This paper outlines our approach to detect dynamic scene changes in Unmanned Surface Vehicles (USVs) Our objective is to identify significant changes in the dynamic scenes of maritime video data, particularly those scenes that exhibit a high degree of resemblance. In our system for dynamic scene change detection, we propose completely unsupervised learning method.
arXiv Detail & Related papers (2023-11-20T07:34:01Z)
MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation with Bird's Eye View based Appearance and Motion Features [5.186531650935954]
We present MotionBEV, a fast and accurate framework for LiDAR moving object segmentation. Our approach converts 3D LiDAR scans into a 2D polar BEV representation to improve computational efficiency. We employ a dual-branch network bridged by the Appearance-Motion Co-attention Module (AMCM) to adaptively fuse the LiDAR-temporal information from appearance and motion features.
arXiv Detail & Related papers (2023-05-12T09:28:09Z)
Large-Field Contextual Feature Learning for Glass Detection [44.222075782263175]
We propose an important problem of detecting glass surfaces from a single RGB image. To address this problem, we construct the first large-scale glass detection dataset (GDD) We propose a novel glass detection network, called GDNet-B, which explores abundant contextual cues in a large field-of-view.
arXiv Detail & Related papers (2022-09-10T11:08:05Z)
Leveraging RGB-D Data with Cross-Modal Context Mining for Glass Surface Detection [47.87834602551456]
Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels.<n>This poses substantial challenges to the operations of autonomous systems such as robots, self-driving cars, and drones.<n>We propose a novel glass surface detection framework combining RGB and depth information.
arXiv Detail & Related papers (2022-06-22T17:56:09Z)
Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework. It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z)
Spatiotemporal Inconsistency Learning for DeepFake Video Detection [51.747219106855624]
We present a novel temporal modeling paradigm in TIM by exploiting the temporal difference over adjacent frames along with both horizontal and vertical directions. And the ISM simultaneously utilizes the spatial information from SIM and temporal information from TIM to establish a more comprehensive spatial-temporal representation.
arXiv Detail & Related papers (2021-09-04T13:05:37Z)
GlassNet: Label Decoupling-based Three-stream Neural Network for Robust Image Glass Detection [1.1825946875790057]
We exploit label decoupling to decompose the labeled ground-truth (GT) map into an interior-diffusion map and a boundary-diffusion map. The GT map in collaboration with the two newly generated maps breaks the imbalanced distribution of the object boundary, leading to improved glass detection quality. We develop an attention-based boundary-aware feature Mosaic module to integrate multi-modal information.
arXiv Detail & Related papers (2021-08-25T08:33:49Z)
Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS) Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage. We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z)
Enhanced Boundary Learning for Glass-like Object Segmentation [55.45473926510806]
This paper aims to solve the glass-like object segmentation problem via enhanced boundary learning. In particular, we first propose a novel refined differential module for generating finer boundary cues. An edge-aware point-based graph convolution network module is proposed to model the global shape representation along the boundary.
arXiv Detail & Related papers (2021-03-29T16:18:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.