Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical
Flow with Monocular Depth Completion Prior
- URL: http://arxiv.org/abs/2310.09956v1
- Date: Sun, 15 Oct 2023 21:30:06 GMT
- Title: Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical
Flow with Monocular Depth Completion Prior
- Authors: Xiaotong Chen, Zheming Zhou, Zhuo Deng, Omid Ghasemalizadeh, Min Sun,
Cheng-Hao Kuo, Arnie Sen
- Abstract summary: We introduce a two-stage pipeline for reconstructing transparent objects tailored for mobile platforms.
Epipolar-guided Optical Flow (EOF) to fuse several single-view shape priors to a cross-view consistent 3D reconstruction.
Our pipeline significantly outperforms baseline methods in 3D reconstruction quality.
- Score: 14.049778178534588
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reconstructing transparent objects using affordable RGB-D cameras is a
persistent challenge in robotic perception due to inconsistent appearances
across views in the RGB domain and inaccurate depth readings in each
single-view. We introduce a two-stage pipeline for reconstructing transparent
objects tailored for mobile platforms. In the first stage, off-the-shelf
monocular object segmentation and depth completion networks are leveraged to
predict the depth of transparent objects, furnishing single-view shape prior.
Subsequently, we propose Epipolar-guided Optical Flow (EOF) to fuse several
single-view shape priors from the first stage to a cross-view consistent 3D
reconstruction given camera poses estimated from opaque part of the scene. Our
key innovation lies in EOF which employs boundary-sensitive sampling and
epipolar-line constraints into optical flow to accurately establish 2D
correspondences across multiple views on transparent objects. Quantitative
evaluations demonstrate that our pipeline significantly outperforms baseline
methods in 3D reconstruction quality, paving the way for more adept robotic
perception and interaction with transparent objects.
Related papers
- Diffusion-Based Depth Inpainting for Transparent and Reflective Objects [6.571006663689738]
We propose a diffusion-based Depth Inpainting framework specifically designed for Transparent and Reflective objects.
DITR is highly effective in depth inpainting tasks of transparent and reflective objects with robust adaptability.
arXiv Detail & Related papers (2024-10-11T06:45:15Z) - ClearDepth: Enhanced Stereo Perception of Transparent Objects for Robotic Manipulation [18.140839442955485]
We develop a vision transformer-based algorithm for stereo depth recovery of transparent objects.
Our method incorporates a parameter-aligned, domain-adaptive, and physically realistic Sim2Real simulation for efficient data generation.
Our experimental results demonstrate the model's exceptional Sim2Real generalizability in real-world scenarios.
arXiv Detail & Related papers (2024-09-13T15:44:38Z) - Transparent Object Depth Completion [11.825680661429825]
The perception of transparent objects for grasp and manipulation remains a major challenge.
Existing robotic grasp methods which heavily rely on depth maps are not suitable for transparent objects due to their unique visual properties.
We propose an end-to-end network for transparent object depth completion that combines the strengths of single-view RGB-D based depth completion and multi-view depth estimation.
arXiv Detail & Related papers (2024-05-24T07:38:06Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.
Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - StereoPose: Category-Level 6D Transparent Object Pose Estimation from
Stereo Images via Back-View NOCS [106.62225866064313]
We present StereoPose, a novel stereo image framework for category-level object pose estimation.
For a robust estimation from pure stereo images, we develop a pipeline that decouples category-level pose estimation into object size estimation, initial pose estimation, and pose refinement.
To address the issue of image content aliasing, we define a back-view NOCS map for the transparent object.
The back-view NOCS aims to reduce the network learning ambiguity caused by content aliasing, and leverage informative cues on the back of the transparent object for more accurate pose estimation.
arXiv Detail & Related papers (2022-11-03T08:36:09Z) - OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object
Detection [51.153003057515754]
OPA-3D is a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network.
It jointly estimates dense scene depth with depth-bounding box residuals and object bounding boxes.
It outperforms state-of-the-art methods on the main Car category.
arXiv Detail & Related papers (2022-11-02T14:19:13Z) - Seeing Glass: Joint Point Cloud and Depth Completion for Transparent
Objects [16.714074893209713]
TranspareNet is a joint point cloud and depth completion method.
It can complete the depth of transparent objects in cluttered and complex scenes.
TranspareNet outperforms existing state-of-the-art depth completion methods on multiple datasets.
arXiv Detail & Related papers (2021-09-30T21:09:09Z) - Polka Lines: Learning Structured Illumination and Reconstruction for
Active Stereo [52.68109922159688]
We introduce a novel differentiable image formation model for active stereo, relying on both wave and geometric optics, and a novel trinocular reconstruction network.
The jointly optimized pattern, which we dub "Polka Lines," together with the reconstruction network, achieve state-of-the-art active-stereo depth estimates across imaging conditions.
arXiv Detail & Related papers (2020-11-26T04:02:43Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.