RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth
Completion
- URL: http://arxiv.org/abs/2309.00655v4
- Date: Wed, 28 Feb 2024 06:58:46 GMT
- Title: RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth
Completion
- Authors: Zhiqiang Yan and Xiang Li and Le Hui and Zhenyu Zhang and Jun Li and
Jian Yang
- Abstract summary: We explore a repetitive design in our image guided network to gradually and sufficiently recover depth values.
In the former branch, we design a dense repetitive hourglass network (DRHN) to extract discriminative image features of complex environments.
In the latter branch, we present a repetitive guidance (RG) module based on dynamic convolution, in which an efficient convolution factorization is proposed to reduce the complexity.
In addition, we propose a region-aware spatial propagation network (RASPN) for further depth refinement based on the semantic prior constraint.
- Score: 31.70022495622075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth completion aims to recover dense depth maps from sparse ones, where
color images are often used to facilitate this task. Recent depth methods
primarily focus on image guided learning frameworks. However, blurry guidance
in the image and unclear structure in the depth still impede their performance.
To tackle these challenges, we explore a repetitive design in our image guided
network to gradually and sufficiently recover depth values. Specifically, the
repetition is embodied in both the image guidance branch and depth generation
branch. In the former branch, we design a dense repetitive hourglass network
(DRHN) to extract discriminative image features of complex environments, which
can provide powerful contextual instruction for depth prediction. In the latter
branch, we present a repetitive guidance (RG) module based on dynamic
convolution, in which an efficient convolution factorization is proposed to
reduce the complexity while modeling high-frequency structures progressively.
Furthermore, in the semantic guidance branch, we utilize the well-known large
vision model, i.e., segment anything (SAM), to supply RG with semantic prior.
In addition, we propose a region-aware spatial propagation network (RASPN) for
further depth refinement based on the semantic prior constraint. Finally, we
collect a new dataset termed TOFDC for the depth completion task, which is
acquired by the time-of-flight (TOF) sensor and the color camera on
smartphones. Extensive experiments demonstrate that our method achieves
state-of-the-art performance on KITTI, NYUv2, Matterport3D, 3D60, VKITTI, and
our TOFDC.
Related papers
- Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - Depth-guided Texture Diffusion for Image Semantic Segmentation [47.46257473475867]
We introduce a Depth-guided Texture Diffusion approach that effectively tackles the outlined challenge.
Our method extracts low-level features from edges and textures to create a texture image.
By integrating this enriched depth map with the original RGB image into a joint feature embedding, our method effectively bridges the disparity between the depth map and the image.
arXiv Detail & Related papers (2024-08-17T04:55:03Z) - A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion [10.519644854849098]
We propose a two-step Transformer-based network for indoor depth completion.
Our proposed network achieves the state-of-the-art performance on the Matterport3D dataset.
In addition, to validate the importance of the depth completion task, we apply our methods to indoor 3D reconstruction.
arXiv Detail & Related papers (2024-06-14T07:42:27Z) - RigNet: Repetitive Image Guided Network for Depth Completion [20.66405067066299]
Recent approaches mainly focus on image guided learning to predict dense results.
blurry image guidance and object structures in depth still impede the performance of image guided frameworks.
We explore a repetitive design in our image guided network to sufficiently and gradually recover depth values.
Our method achieves state-of-the-art result on the NYUv2 dataset and ranks 1st on the KITTI benchmark at the time of submission.
arXiv Detail & Related papers (2021-07-29T08:00:33Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - S2R-DepthNet: Learning a Generalizable Depth-specific Structural
Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes.
We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information.
Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - FCFR-Net: Feature Fusion based Coarse-to-Fine Residual Learning for
Monocular Depth Completion [15.01291779855834]
Recent approaches mainly formulate the depth completion as a one-stage end-to-end learning task.
We propose a novel end-to-end residual learning framework, which formulates the depth completion as a two-stage learning task.
arXiv Detail & Related papers (2020-12-15T13:09:56Z) - Depth Edge Guided CNNs for Sparse Depth Upsampling [18.659087667114274]
Guided sparse depth upsampling aims to upsample an irregularly sampled sparse depth map when an aligned high-resolution color image is given as guidance.
We propose a guided convolutional layer to recover dense depth from sparse and irregular depth image with an depth edge image as guidance.
We conduct comprehensive experiments to verify our method on real-world indoor and synthetic outdoor datasets.
arXiv Detail & Related papers (2020-03-23T08:56:32Z) - Depth Completion Using a View-constrained Deep Prior [73.21559000917554]
Recent work has shown that the structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images.
This prior, known as a deep image prior (DIP), is an effective regularizer in inverse problems such as image denoising and inpainting.
We extend the concept of the DIP to depth images. Given color images and noisy and incomplete target depth maps, we reconstruct a depth map restored by virtue of using the CNN network structure as a prior.
arXiv Detail & Related papers (2020-01-21T21:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.