Power of Boundary and Reflection: Semantic Transparent Object Segmentation using Pyramid Vision Transformer with Transparent Cues
- URL: http://arxiv.org/abs/2512.07034v1
- Date: Sun, 07 Dec 2025 22:52:53 GMT
- Title: Power of Boundary and Reflection: Semantic Transparent Object Segmentation using Pyramid Vision Transformer with Transparent Cues
- Authors: Tuan-Anh Vu, Hai Nguyen-Truong, Ziqiang Zheng, Binh-Son Hua, Qing Guo, Ivor Tsang, Sai-Kit Yeung,
- Abstract summary: We propose incorporating powerful visual cues via the Boundary Feature Enhancement and Reflection Feature Enhancement modules.<n>Our proposed framework, TransCues, is a pyramidal transformer encoder-decoder architecture to segment transparent objects.<n>Our method outperforms the state-of-the-art by a large margin, achieving +4.2% mIoU on Trans10K-v2, +5.6% mIoU on MSD, +10.1% mIoU on RGBD-Mirror, +13.1% mIoU on TROSD, and +8.3% mIoU on Stanford2D3D.
- Score: 35.65981887193136
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Glass is a prevalent material among solid objects in everyday life, yet segmentation methods struggle to distinguish it from opaque materials due to its transparency and reflection. While it is known that human perception relies on boundary and reflective-object features to distinguish glass objects, the existing literature has not yet sufficiently captured both properties when handling transparent objects. Hence, we propose incorporating both of these powerful visual cues via the Boundary Feature Enhancement and Reflection Feature Enhancement modules in a mutually beneficial way. Our proposed framework, TransCues, is a pyramidal transformer encoder-decoder architecture to segment transparent objects. We empirically show that these two modules can be used together effectively, improving overall performance across various benchmark datasets, including glass object semantic segmentation, mirror object semantic segmentation, and generic segmentation datasets. Our method outperforms the state-of-the-art by a large margin, achieving +4.2% mIoU on Trans10K-v2, +5.6% mIoU on MSD, +10.1% mIoU on RGBD-Mirror, +13.1% mIoU on TROSD, and +8.3% mIoU on Stanford2D3D, showing the effectiveness of our method against glass objects.
Related papers
- Glass Segmentation with Fusion of Learned and General Visual Features [2.3821941487858935]
Glass surface segmentation from RGB images is a challenging task, since glass as a transparent material distinctly lacks visual characteristics.<n>This paper presents a novel architecture for glass segmentation, deploying a dual-backbone producing general visual features as well as task-specific learned visual features.<n>The architecture was evaluated on four commonly used glass segmentation datasets, achieving state-of-the-art results on several accuracy metrics.
arXiv Detail & Related papers (2026-03-04T04:40:30Z) - DiffTrans: Differentiable Geometry-Materials Decomposition for Reconstructing Transparent Objects [53.83670041249326]
Reconstructing transparent objects from a set of multi-view images is a challenging task due to the complicated nature and indeterminate behavior of light propagation.<n>We propose a differentiable rendering framework for transparent objects, dubbed DiffTrans, which allows for efficient decomposition and reconstruction of the geometry and materials of transparent objects.
arXiv Detail & Related papers (2026-02-28T02:21:31Z) - EGSA-PT:Edge-Guided Spatial Attention with Progressive Training for Monocular Depth Estimation and Segmentation of Transparent Objects [3.6327828943194937]
We introduce Edge-Guided Spatial Attention (EGSA), a fusion mechanism designed to mitigate destructive interactions.<n>On both Syn-TODD and ClearPose benchmarks, EGSA consistently improved depth accuracy over the current state of the art method.<n>Our second contribution is a multi-modal progressive training strategy, where learning transitions from edges derived from RGB images to edges derived from predicted depth images.
arXiv Detail & Related papers (2025-11-18T23:29:20Z) - Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion [9.391182087420926]
We propose a monocular framework, which is the first to excel in both segmentation and depth estimation of transparent objects.<n>Specifically, we devise a novel semantic and geometric fusion module, effectively integrating the multi-scale information between tasks.<n> Experiments on two challenging synthetic and real-world datasets demonstrate that our model surpasses state-of-the-art monocular, stereo, and multi-view methods by a large margin.
arXiv Detail & Related papers (2025-02-20T14:57:01Z) - Weak-to-Strong 3D Object Detection with X-Ray Distillation [75.47580744933724]
We propose a versatile technique that seamlessly integrates into any existing framework for 3D Object Detection.
X-Ray Distillation with Object-Complete Frames is suitable for both supervised and semi-supervised settings.
Our proposed methods surpass state-of-the-art in semi-supervised learning by 1-1.5 mAP.
arXiv Detail & Related papers (2024-03-31T13:09:06Z) - Glass Segmentation with Multi Scales and Primary Prediction Guiding [2.66512000865131]
Glass-like objects can be seen everywhere in our daily life which are hard for existing methods to segment them.
We propose MGNet, which consists of a FineRescaling and Merging module (FRM) to improve the ability to extract semantics.
We supervise the model with a novel loss function with the uncertainty-aware loss to produce high-confidence segmentation maps.
arXiv Detail & Related papers (2024-02-13T16:14:32Z) - Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem.
In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images.
The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Enhanced Boundary Learning for Glass-like Object Segmentation [55.45473926510806]
This paper aims to solve the glass-like object segmentation problem via enhanced boundary learning.
In particular, we first propose a novel refined differential module for generating finer boundary cues.
An edge-aware point-based graph convolution network module is proposed to model the global shape representation along the boundary.
arXiv Detail & Related papers (2021-03-29T16:18:57Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.