Glass Surface Segmentation with an RGB-D Camera via Weighted Feature Fusion for Service Robots
- URL: http://arxiv.org/abs/2508.01639v1
- Date: Sun, 03 Aug 2025 07:58:10 GMT
- Title: Glass Surface Segmentation with an RGB-D Camera via Weighted Feature Fusion for Service Robots
- Authors: Henghong Lin, Zihan Zhu, Tao Wang, Anastasia Ioannou, Yuanshui Huang,
- Abstract summary: We propose a Weighted Feature Fusion (WFF) module that combines RGB and depth features to tackle issues such as transparency and reflections.<n>We also introduce the MJU-Glass dataset, a comprehensive RGB-D dataset collected by a service robot navigating real-world environments.
- Score: 4.58103945996187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of glass surface segmentation with an RGB-D camera, with a focus on effectively fusing RGB and depth information. To this end, we propose a Weighted Feature Fusion (WFF) module that dynamically and adaptively combines RGB and depth features to tackle issues such as transparency, reflections, and occlusions. This module can be seamlessly integrated with various deep neural network backbones as a plug-and-play solution. Additionally, we introduce the MJU-Glass dataset, a comprehensive RGB-D dataset collected by a service robot navigating real-world environments, providing a valuable benchmark for evaluating segmentation models. Experimental results show significant improvements in segmentation accuracy and robustness, with the WFF module enhancing performance in both mean Intersection over Union (mIoU) and boundary IoU (bIoU), achieving a 7.49% improvement in bIoU when integrated with PSPNet. The proposed module and dataset provide a robust framework for advancing glass surface segmentation in robotics and reducing the risk of collisions with glass objects.
Related papers
- RGB-D Video Object Segmentation via Enhanced Multi-store Feature Memory [34.406308400305385]
RGB-Depth (RGB-D) Video Object (VOS) aims to integrate the fine-grained texture information of RGB with the geometric clues of depth modality.<n>In this paper, we propose a novel RGB-D VOS via multi-store feature memory for robust segmentation.<n>We show that the proposed method state-of-the-art performance on the latest RGB-D VOS benchmark.
arXiv Detail & Related papers (2025-04-23T07:31:37Z) - Optimizing rgb-d semantic segmentation through multi-modal interaction
and pooling attention [5.518612382697244]
Multi-modal Interaction and Pooling Attention Network (MIPANet) is designed to harness the interactive synergy between RGB and depth modalities.
We introduce a Pooling Attention Module (PAM) at various stages of the encoder.
This module serves to amplify the features extracted by the network and integrates the module's output into the decoder.
arXiv Detail & Related papers (2023-11-19T12:25:59Z) - Salient Object Detection in RGB-D Videos [11.805682025734551]
This paper makes two primary contributions: the dataset and the model.
We construct the RDVS dataset, a new RGB-D VSOD dataset with realistic depth.
We introduce DCTNet+, a three-stream network tailored for RGB-D VSOD.
arXiv Detail & Related papers (2023-10-24T03:18:07Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Leveraging RGB-D Data with Cross-Modal Context Mining for Glass Surface Detection [47.87834602551456]
Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels.<n>This poses substantial challenges to the operations of autonomous systems such as robots, self-driving cars, and drones.<n>We propose a novel glass surface detection framework combining RGB and depth information.
arXiv Detail & Related papers (2022-06-22T17:56:09Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Siamese Network for RGB-D Salient Object Detection and Beyond [113.30063105890041]
A novel framework is proposed to learn from both RGB and depth inputs through a shared network backbone.
Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector.
We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models.
arXiv Detail & Related papers (2020-08-26T06:01:05Z) - RGB-D Salient Object Detection with Cross-Modality Modulation and
Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD)
The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z) - Hierarchical Dynamic Filtering Network for RGB-D Salient Object
Detection [91.43066633305662]
The main purpose of RGB-D salient object detection (SOD) is how to better integrate and utilize cross-modal fusion information.
In this paper, we explore these issues from a new perspective.
We implement a kind of more flexible and efficient multi-scale cross-modal feature processing.
arXiv Detail & Related papers (2020-07-13T07:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.