Learning Depth Estimation for Transparent and Mirror Surfaces
- URL: http://arxiv.org/abs/2307.15052v1
- Date: Thu, 27 Jul 2023 17:57:06 GMT
- Title: Learning Depth Estimation for Transparent and Mirror Surfaces
- Authors: Alex Costanzino, Pierluigi Zama Ramirez, Matteo Poggi, Fabio Tosi,
Stefano Mattoccia, Luigi Di Stefano
- Abstract summary: Inferring the depth of transparent or mirror (ToM) surfaces represents a hard challenge for either sensors, algorithms, or deep networks.
We propose a simple pipeline for learning to estimate depth properly for such surfaces with neural networks.
- Score: 46.07527228487614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inferring the depth of transparent or mirror (ToM) surfaces represents a hard
challenge for either sensors, algorithms, or deep networks. We propose a simple
pipeline for learning to estimate depth properly for such surfaces with neural
networks, without requiring any ground-truth annotation. We unveil how to
obtain reliable pseudo labels by in-painting ToM objects in images and
processing them with a monocular depth estimation model. These labels can be
used to fine-tune existing monocular or stereo networks, to let them learn how
to deal with ToM surfaces. Experimental results on the Booster dataset show the
dramatic improvements enabled by our remarkably simple proposal.
Related papers
- Transparent Object Depth Completion [11.825680661429825]
The perception of transparent objects for grasp and manipulation remains a major challenge.
Existing robotic grasp methods which heavily rely on depth maps are not suitable for transparent objects due to their unique visual properties.
We propose an end-to-end network for transparent object depth completion that combines the strengths of single-view RGB-D based depth completion and multi-view depth estimation.
arXiv Detail & Related papers (2024-05-24T07:38:06Z) - RFTrans: Leveraging Refractive Flow of Transparent Objects for Surface
Normal Estimation and Manipulation [50.10282876199739]
This paper introduces RFTrans, an RGB-D-based method for surface normal estimation and manipulation of transparent objects.
It integrates the RFNet, which predicts refractive flow, object mask, and boundaries, followed by the F2Net, which estimates surface normal from the refractive flow.
A real-world robot grasping task witnesses an 83% success rate, proving that refractive flow can help enable direct sim-to-real transfer.
arXiv Detail & Related papers (2023-11-21T07:19:47Z) - Multi-View Stereo Representation Revisit: Region-Aware MVSNet [8.264851594332677]
Deep learning-based multi-view stereo has emerged as a powerful paradigm for reconstructing the complete geometrically-detailed objects from multi-views.
We propose RA-MVSNet to take advantage of point-to-surface distance so that the model is able to perceive a wider range of surfaces.
Our proposed RA-MVSNet is patch-awared, since the perception range is enhanced by associating hypothetical planes with a patch of surface.
arXiv Detail & Related papers (2023-04-26T15:17:51Z) - TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using
Differentiable Rendering [54.35405028643051]
We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone.
Our method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps.
We adopt the neural implicit surface reconstruction method, which allows for high-quality mesh.
arXiv Detail & Related papers (2023-03-27T10:07:52Z) - MonoGraspNet: 6-DoF Grasping with a Single RGB Image [73.96707595661867]
6-DoF robotic grasping is a long-lasting but unsolved problem.
Recent methods utilize strong 3D networks to extract geometric grasping representations from depth sensors.
We propose the first RGB-only 6-DoF grasping pipeline called MonoGraspNet.
arXiv Detail & Related papers (2022-09-26T21:29:50Z) - nLMVS-Net: Deep Non-Lambertian Multi-View Stereo [24.707415091168556]
We introduce a novel multi-view stereo (MVS) method that can simultaneously recover per-pixel depth but also surface normals.
Our key idea is to formulate MVS as an end-to-end learnable network, which seamlessly integrates radiometric cues to leverage surface normals as view-independent surface features.
arXiv Detail & Related papers (2022-07-25T02:20:21Z) - Monocular Depth Estimation for Semi-Transparent Volume Renderings [10.496309857650306]
monocular depth estimation networks are increasingly reliable in real-world scenes.
We show that adaptions of existing approaches to monocular depth estimation perform well on semi-transparent volume renderings.
arXiv Detail & Related papers (2022-06-27T13:18:02Z) - Layered Depth Refinement with Mask Guidance [61.10654666344419]
We formulate a novel problem of mask-guided depth refinement that utilizes a generic mask to refine the depth prediction of SIDE models.
Our framework performs layered refinement and inpainting/outpainting, decomposing the depth map into two separate layers signified by the mask and the inverse mask.
We empirically show that our method is robust to different types of masks and initial depth predictions, accurately refining depth values in inner and outer mask boundary regions.
arXiv Detail & Related papers (2022-06-07T06:42:44Z) - Adaptive confidence thresholding for monocular depth estimation [83.06265443599521]
We propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods.
The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps.
Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods.
arXiv Detail & Related papers (2020-09-27T13:26:16Z) - Deep Depth Estimation from Visual-Inertial SLAM [11.814395824799988]
We study the case in which the sparse depth is computed from a visual-inertial simultaneous localization and mapping (VI-SLAM) system.
The resulting point cloud has low density, it is noisy, and has non-uniform spatial distribution.
We use the available gravity estimate from the VI-SLAM to warp the input image to the orientation prevailing in the training dataset.
arXiv Detail & Related papers (2020-07-31T21:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.