Diffusion-Based Depth Inpainting for Transparent and Reflective Objects
- URL: http://arxiv.org/abs/2410.08567v1
- Date: Fri, 11 Oct 2024 06:45:15 GMT
- Title: Diffusion-Based Depth Inpainting for Transparent and Reflective Objects
- Authors: Tianyu Sun, Dingchang Hu, Yixiang Dai, Guijin Wang,
- Abstract summary: We propose a diffusion-based Depth Inpainting framework specifically designed for Transparent and Reflective objects.
DITR is highly effective in depth inpainting tasks of transparent and reflective objects with robust adaptability.
- Score: 6.571006663689738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transparent and reflective objects, which are common in our everyday lives, present a significant challenge to 3D imaging techniques due to their unique visual and optical properties. Faced with these types of objects, RGB-D cameras fail to capture the real depth value with their accurate spatial information. To address this issue, we propose DITR, a diffusion-based Depth Inpainting framework specifically designed for Transparent and Reflective objects. This network consists of two stages, including a Region Proposal stage and a Depth Inpainting stage. DITR dynamically analyzes the optical and geometric depth loss and inpaints them automatically. Furthermore, comprehensive experimental results demonstrate that DITR is highly effective in depth inpainting tasks of transparent and reflective objects with robust adaptability.
Related papers
- ClearDepth: Enhanced Stereo Perception of Transparent Objects for Robotic Manipulation [18.140839442955485]
We develop a vision transformer-based algorithm for stereo depth recovery of transparent objects.
Our method incorporates a parameter-aligned, domain-adaptive, and physically realistic Sim2Real simulation for efficient data generation.
Our experimental results demonstrate the model's exceptional Sim2Real generalizability in real-world scenarios.
arXiv Detail & Related papers (2024-09-13T15:44:38Z) - Transparent Object Depth Completion [11.825680661429825]
The perception of transparent objects for grasp and manipulation remains a major challenge.
Existing robotic grasp methods which heavily rely on depth maps are not suitable for transparent objects due to their unique visual properties.
We propose an end-to-end network for transparent object depth completion that combines the strengths of single-view RGB-D based depth completion and multi-view depth estimation.
arXiv Detail & Related papers (2024-05-24T07:38:06Z) - UniSDF: Unifying Neural Representations for High-Fidelity 3D
Reconstruction of Complex Scenes with Reflections [92.38975002642455]
We propose UniSDF, a general purpose 3D reconstruction method that can reconstruct large complex scenes with reflections.
Our method is able to robustly reconstruct complex large-scale scenes with fine details and reflective surfaces.
arXiv Detail & Related papers (2023-12-20T18:59:42Z) - Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical
Flow with Monocular Depth Completion Prior [14.049778178534588]
We introduce a two-stage pipeline for reconstructing transparent objects tailored for mobile platforms.
Epipolar-guided Optical Flow (EOF) to fuse several single-view shape priors to a cross-view consistent 3D reconstruction.
Our pipeline significantly outperforms baseline methods in 3D reconstruction quality.
arXiv Detail & Related papers (2023-10-15T21:30:06Z) - Neural Fields meet Explicit Geometric Representation for Inverse
Rendering of Urban Scenes [62.769186261245416]
We present a novel inverse rendering framework for large urban scenes capable of jointly reconstructing the scene geometry, spatially-varying materials, and HDR lighting from a set of posed RGB images with optional depth.
Specifically, we use a neural field to account for the primary rays, and use an explicit mesh (reconstructed from the underlying neural field) for modeling secondary rays that produce higher-order lighting effects such as cast shadows.
arXiv Detail & Related papers (2023-04-06T17:51:54Z) - MonoGraspNet: 6-DoF Grasping with a Single RGB Image [73.96707595661867]
6-DoF robotic grasping is a long-lasting but unsolved problem.
Recent methods utilize strong 3D networks to extract geometric grasping representations from depth sensors.
We propose the first RGB-only 6-DoF grasping pipeline called MonoGraspNet.
arXiv Detail & Related papers (2022-09-26T21:29:50Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - ClearPose: Large-scale Transparent Object Dataset and Benchmark [7.342978076186365]
We contribute a large-scale real-world RGB-Depth transparent object dataset named ClearPose to serve as a benchmark dataset for segmentation, scene-level depth completion and object-centric pose estimation tasks.
The ClearPose dataset contains over 350K labeled real-world RGB-Depth frames and 4M instance annotations covering 63 household objects.
arXiv Detail & Related papers (2022-03-08T07:29:31Z) - Through the Looking Glass: Neural 3D Reconstruction of Transparent
Shapes [75.63464905190061]
Complex light paths induced by refraction and reflection have prevented both traditional and deep multiview stereo from solving this problem.
We propose a physically-based network to recover 3D shape of transparent objects using a few images acquired with a mobile phone camera.
Our experiments show successful recovery of high-quality 3D geometry for complex transparent shapes using as few as 5-12 natural images.
arXiv Detail & Related papers (2020-04-22T23:51:30Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.