Related papers: Learning Depth Estimation for Transparent and Mirror Surfaces

Learning Depth Estimation for Transparent and Mirror Surfaces

URL: http://arxiv.org/abs/2307.15052v1
Date: Thu, 27 Jul 2023 17:57:06 GMT
Title: Learning Depth Estimation for Transparent and Mirror Surfaces
Authors: Alex Costanzino, Pierluigi Zama Ramirez, Matteo Poggi, Fabio Tosi, Stefano Mattoccia, Luigi Di Stefano
Abstract summary: Inferring the depth of transparent or mirror (ToM) surfaces represents a hard challenge for either sensors, algorithms, or deep networks. We propose a simple pipeline for learning to estimate depth properly for such surfaces with neural networks.
Score: 46.07527228487614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Inferring the depth of transparent or mirror (ToM) surfaces represents a hard challenge for either sensors, algorithms, or deep networks. We propose a simple pipeline for learning to estimate depth properly for such surfaces with neural networks, without requiring any ground-truth annotation. We unveil how to obtain reliable pseudo labels by in-painting ToM objects in images and processing them with a monocular depth estimation model. These labels can be used to fine-tune existing monocular or stereo networks, to let them learn how to deal with ToM surfaces. Experimental results on the Booster dataset show the dramatic improvements enabled by our remarkably simple proposal.

Related papers

Rethinking Transparent Object Grasping: Depth Completion with Monocular Depth Estimation and Instance Mask [10.472380465235629]
ReMake is a novel depth completion framework guided by an instance mask and monocular depth estimation.<n>Our method outperforms existing approaches on both benchmark datasets and real-world scenarios.
arXiv Detail & Related papers (2025-08-04T15:14:47Z)
TDCNet: Transparent Objects Depth Completion with CNN-Transformer Dual-Branch Parallel Network [8.487135422430972]
We propose TDCNet, a novel dual-branch CNN-Transformer parallel network for transparent object depth completion. Our model achieves state-of-the-art performance across multiple public datasets.
arXiv Detail & Related papers (2024-12-19T15:42:21Z)
High-Fidelity Mask-free Neural Surface Reconstruction for Virtual Reality [6.987660269386849]
Hi-NeuS is a novel rendering-based framework for neural implicit surface reconstruction. Our approach has been validated through NeuS and its variant Neuralangelo.
arXiv Detail & Related papers (2024-09-20T02:07:49Z)
Transparent Object Depth Completion [11.825680661429825]
The perception of transparent objects for grasp and manipulation remains a major challenge. Existing robotic grasp methods which heavily rely on depth maps are not suitable for transparent objects due to their unique visual properties. We propose an end-to-end network for transparent object depth completion that combines the strengths of single-view RGB-D based depth completion and multi-view depth estimation.
arXiv Detail & Related papers (2024-05-24T07:38:06Z)
RFTrans: Leveraging Refractive Flow of Transparent Objects for Surface Normal Estimation and Manipulation [50.10282876199739]
This paper introduces RFTrans, an RGB-D-based method for surface normal estimation and manipulation of transparent objects. It integrates the RFNet, which predicts refractive flow, object mask, and boundaries, followed by the F2Net, which estimates surface normal from the refractive flow. A real-world robot grasping task witnesses an 83% success rate, proving that refractive flow can help enable direct sim-to-real transfer.
arXiv Detail & Related papers (2023-11-21T07:19:47Z)
Multi-View Stereo Representation Revisit: Region-Aware MVSNet [8.264851594332677]
Deep learning-based multi-view stereo has emerged as a powerful paradigm for reconstructing the complete geometrically-detailed objects from multi-views. We propose RA-MVSNet to take advantage of point-to-surface distance so that the model is able to perceive a wider range of surfaces. Our proposed RA-MVSNet is patch-awared, since the perception range is enhanced by associating hypothetical planes with a patch of surface.
arXiv Detail & Related papers (2023-04-26T15:17:51Z)
TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering [54.35405028643051]
We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone. Our method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps. We adopt the neural implicit surface reconstruction method, which allows for high-quality mesh.
arXiv Detail & Related papers (2023-03-27T10:07:52Z)
MonoGraspNet: 6-DoF Grasping with a Single RGB Image [73.96707595661867]
6-DoF robotic grasping is a long-lasting but unsolved problem. Recent methods utilize strong 3D networks to extract geometric grasping representations from depth sensors. We propose the first RGB-only 6-DoF grasping pipeline called MonoGraspNet.
arXiv Detail & Related papers (2022-09-26T21:29:50Z)
Monocular Depth Estimation for Semi-Transparent Volume Renderings [10.496309857650306]
monocular depth estimation networks are increasingly reliable in real-world scenes. We show that adaptions of existing approaches to monocular depth estimation perform well on semi-transparent volume renderings.
arXiv Detail & Related papers (2022-06-27T13:18:02Z)
Layered Depth Refinement with Mask Guidance [61.10654666344419]
We formulate a novel problem of mask-guided depth refinement that utilizes a generic mask to refine the depth prediction of SIDE models. Our framework performs layered refinement and inpainting/outpainting, decomposing the depth map into two separate layers signified by the mask and the inverse mask. We empirically show that our method is robust to different types of masks and initial depth predictions, accurately refining depth values in inner and outer mask boundary regions.
arXiv Detail & Related papers (2022-06-07T06:42:44Z)
Adaptive confidence thresholding for monocular depth estimation [83.06265443599521]
We propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods. The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps. Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods.
arXiv Detail & Related papers (2020-09-27T13:26:16Z)
Deep Depth Estimation from Visual-Inertial SLAM [11.814395824799988]
We study the case in which the sparse depth is computed from a visual-inertial simultaneous localization and mapping (VI-SLAM) system. The resulting point cloud has low density, it is noisy, and has non-uniform spatial distribution. We use the available gravity estimate from the VI-SLAM to warp the input image to the orientation prevailing in the training dataset.
arXiv Detail & Related papers (2020-07-31T21:28:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.