DevNet: Self-supervised Monocular Depth Learning via Density Volume
Construction
- URL: http://arxiv.org/abs/2209.06351v2
- Date: Thu, 15 Sep 2022 10:07:28 GMT
- Title: DevNet: Self-supervised Monocular Depth Learning via Density Volume
Construction
- Authors: Kaichen Zhou, Lanqing Hong, Changhao Chen, Hang Xu, Chaoqiang Ye,
Qingyong Hu, and Zhenguo Li
- Abstract summary: Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames.
This work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework.
- Score: 51.96971077984869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised depth learning from monocular images normally relies on the
2D pixel-wise photometric relation between temporally adjacent image frames.
However, they neither fully exploit the 3D point-wise geometric
correspondences, nor effectively tackle the ambiguities in the photometric
warping caused by occlusions or illumination inconsistency. To address these
problems, this work proposes Density Volume Construction Network (DevNet), a
novel self-supervised monocular depth learning framework, that can consider 3D
spatial information, and exploit stronger geometric constraints among adjacent
camera frustums. Instead of directly regressing the pixel value from a single
image, our DevNet divides the camera frustum into multiple parallel planes and
predicts the pointwise occlusion probability density on each plane. The final
depth map is generated by integrating the density along corresponding rays.
During the training process, novel regularization strategies and loss functions
are introduced to mitigate photometric ambiguities and overfitting. Without
obviously enlarging model parameters size or running time, DevNet outperforms
several representative baselines on both the KITTI-2015 outdoor dataset and
NYU-V2 indoor dataset. In particular, the root-mean-square-deviation is reduced
by around 4% with DevNet on both KITTI-2015 and NYU-V2 in the task of depth
estimation. Code is available at https://github.com/gitkaichenzhou/DevNet.
Related papers
- SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net [18.342569823885864]
SLCF-Net is a novel approach for the Semantic Scene Completion task that sequentially fuses LiDAR and camera data.
It jointly estimates missing geometry and semantics in a scene from sequences of RGB images and sparse LiDAR measurements.
It excels in all SSC metrics and shows great temporal consistency.
arXiv Detail & Related papers (2024-03-13T18:12:53Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - DeepFusion: Real-Time Dense 3D Reconstruction for Monocular SLAM using
Single-View Depth and Gradient Predictions [22.243043857097582]
DeepFusion is capable of producing real-time dense reconstructions on a GPU.
It fuses the output of a semi-dense multiview stereo algorithm with the depth and predictions of a CNN in a probabilistic fashion.
Based on its performance on synthetic and real-world datasets, we demonstrate that DeepFusion is capable of performing at least as well as other comparable systems.
arXiv Detail & Related papers (2022-07-25T14:55:26Z) - MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D
Object Detection [10.377424252002792]
monocular 3D object detection lacks accurate depth recovery ability.
Deep neural network (DNN) enables monocular depth-sensing from high-level learned features.
We propose a joint semantic and geometric cost volume to model the depth error.
arXiv Detail & Related papers (2022-03-16T11:54:10Z) - GCNDepth: Self-supervised Monocular Depth Estimation based on Graph
Convolutional Network [11.332580333969302]
This work brings a new solution with a set of improvements, which increase the quantitative and qualitative understanding of depth maps.
A graph convolutional network (GCN) can handle the convolution on non-Euclidean data and it can be applied to irregular image regions within a topological structure.
Our method provided comparable and promising results with a high prediction accuracy of 89% on the publicly KITTI and Make3D datasets.
arXiv Detail & Related papers (2021-12-13T16:46:25Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z) - Learning Geometry-Disentangled Representation for Complementary
Understanding of 3D Object Point Cloud [50.56461318879761]
We propose Geometry-Disentangled Attention Network (GDANet) for 3D image processing.
GDANet disentangles point clouds into contour and flat part of 3D objects, respectively denoted by sharp and gentle variation components.
Experiments on 3D object classification and segmentation benchmarks demonstrate that GDANet achieves the state-of-the-arts with fewer parameters.
arXiv Detail & Related papers (2020-12-20T13:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.