RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth
Completion
- URL: http://arxiv.org/abs/2309.00655v4
- Date: Wed, 28 Feb 2024 06:58:46 GMT
- Title: RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth
Completion
- Authors: Zhiqiang Yan and Xiang Li and Le Hui and Zhenyu Zhang and Jun Li and
Jian Yang
- Abstract summary: We explore a repetitive design in our image guided network to gradually and sufficiently recover depth values.
In the former branch, we design a dense repetitive hourglass network (DRHN) to extract discriminative image features of complex environments.
In the latter branch, we present a repetitive guidance (RG) module based on dynamic convolution, in which an efficient convolution factorization is proposed to reduce the complexity.
In addition, we propose a region-aware spatial propagation network (RASPN) for further depth refinement based on the semantic prior constraint.
- Score: 31.70022495622075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth completion aims to recover dense depth maps from sparse ones, where
color images are often used to facilitate this task. Recent depth methods
primarily focus on image guided learning frameworks. However, blurry guidance
in the image and unclear structure in the depth still impede their performance.
To tackle these challenges, we explore a repetitive design in our image guided
network to gradually and sufficiently recover depth values. Specifically, the
repetition is embodied in both the image guidance branch and depth generation
branch. In the former branch, we design a dense repetitive hourglass network
(DRHN) to extract discriminative image features of complex environments, which
can provide powerful contextual instruction for depth prediction. In the latter
branch, we present a repetitive guidance (RG) module based on dynamic
convolution, in which an efficient convolution factorization is proposed to
reduce the complexity while modeling high-frequency structures progressively.
Furthermore, in the semantic guidance branch, we utilize the well-known large
vision model, i.e., segment anything (SAM), to supply RG with semantic prior.
In addition, we propose a region-aware spatial propagation network (RASPN) for
further depth refinement based on the semantic prior constraint. Finally, we
collect a new dataset termed TOFDC for the depth completion task, which is
acquired by the time-of-flight (TOF) sensor and the color camera on
smartphones. Extensive experiments demonstrate that our method achieves
state-of-the-art performance on KITTI, NYUv2, Matterport3D, 3D60, VKITTI, and
our TOFDC.
Related papers
- A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion [10.519644854849098]
We propose a two-step Transformer-based network for indoor depth completion.
Our proposed network achieves the state-of-the-art performance on the Matterport3D dataset.
In addition, to validate the importance of the depth completion task, we apply our methods to indoor 3D reconstruction.
arXiv Detail & Related papers (2024-06-14T07:42:27Z) - Learning Pixel-wise Continuous Depth Representation via Clustering for
Depth Completion [0.0]
We propose a novel clustering-based framework called CluDe to learn the pixel-wise and continuous depth representation.
CluDe successfully reduces depth smearing around object boundaries by utilizing pixel-wise and continuous depth representation.
CluDe achieves state-of-the-art performance on the VOID datasets and outperforms classification-based methods on the KITTI dataset.
arXiv Detail & Related papers (2024-02-21T07:18:23Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - A Real-Time Online Learning Framework for Joint 3D Reconstruction and
Semantic Segmentation of Indoor Scenes [87.74952229507096]
This paper presents a real-time online vision framework to jointly recover an indoor scene's 3D structure and semantic label.
Given noisy depth maps, a camera trajectory, and 2D semantic labels at train time, the proposed neural network learns to fuse the depth over frames with suitable semantic labels in the scene space.
arXiv Detail & Related papers (2021-08-11T14:29:01Z) - RigNet: Repetitive Image Guided Network for Depth Completion [20.66405067066299]
Recent approaches mainly focus on image guided learning to predict dense results.
blurry image guidance and object structures in depth still impede the performance of image guided frameworks.
We explore a repetitive design in our image guided network to sufficiently and gradually recover depth values.
Our method achieves state-of-the-art result on the NYUv2 dataset and ranks 1st on the KITTI benchmark at the time of submission.
arXiv Detail & Related papers (2021-07-29T08:00:33Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - S2R-DepthNet: Learning a Generalizable Depth-specific Structural
Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes.
We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information.
Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - FCFR-Net: Feature Fusion based Coarse-to-Fine Residual Learning for
Monocular Depth Completion [15.01291779855834]
Recent approaches mainly formulate the depth completion as a one-stage end-to-end learning task.
We propose a novel end-to-end residual learning framework, which formulates the depth completion as a two-stage learning task.
arXiv Detail & Related papers (2020-12-15T13:09:56Z) - Depth Edge Guided CNNs for Sparse Depth Upsampling [18.659087667114274]
Guided sparse depth upsampling aims to upsample an irregularly sampled sparse depth map when an aligned high-resolution color image is given as guidance.
We propose a guided convolutional layer to recover dense depth from sparse and irregular depth image with an depth edge image as guidance.
We conduct comprehensive experiments to verify our method on real-world indoor and synthetic outdoor datasets.
arXiv Detail & Related papers (2020-03-23T08:56:32Z) - Depth Completion Using a View-constrained Deep Prior [73.21559000917554]
Recent work has shown that the structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images.
This prior, known as a deep image prior (DIP), is an effective regularizer in inverse problems such as image denoising and inpainting.
We extend the concept of the DIP to depth images. Given color images and noisy and incomplete target depth maps, we reconstruct a depth map restored by virtue of using the CNN network structure as a prior.
arXiv Detail & Related papers (2020-01-21T21:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.