Self-supervised Depth Estimation Leveraging Global Perception and
Geometric Smoothness Using On-board Videos
- URL: http://arxiv.org/abs/2106.03505v1
- Date: Mon, 7 Jun 2021 10:53:27 GMT
- Title: Self-supervised Depth Estimation Leveraging Global Perception and
Geometric Smoothness Using On-board Videos
- Authors: Shaocheng Jia, Xin Pei, Wei Yao and S.C. Wong
- Abstract summary: We present DLNet for pixel-wise depth estimation, which simultaneously extracts global and local features.
A three-dimensional geometry smoothness loss is proposed to predict a geometrically natural depth map.
In experiments on the KITTI and Make3D benchmarks, the proposed DLNet achieves performance competitive to those of the state-of-the-art methods.
- Score: 0.5276232626689566
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised depth estimation has drawn much attention in recent years as
it does not require labeled data but image sequences. Moreover, it can be
conveniently used in various applications, such as autonomous driving,
robotics, realistic navigation, and smart cities. However, extracting global
contextual information from images and predicting a geometrically natural depth
map remain challenging. In this paper, we present DLNet for pixel-wise depth
estimation, which simultaneously extracts global and local features with the
aid of our depth Linformer block. This block consists of the Linformer and
innovative soft split multi-layer perceptron blocks. Moreover, a
three-dimensional geometry smoothness loss is proposed to predict a
geometrically natural depth map by imposing the second-order smoothness
constraint on the predicted three-dimensional point clouds, thereby realizing
improved performance as a byproduct. Finally, we explore the multi-scale
prediction strategy and propose the maximum margin dual-scale prediction
strategy for further performance improvement. In experiments on the KITTI and
Make3D benchmarks, the proposed DLNet achieves performance competitive to those
of the state-of-the-art methods, reducing time and space complexities by more
than $62\%$ and $56\%$, respectively. Extensive testing on various real-world
situations further demonstrates the strong practicality and generalization
capability of the proposed model.
Related papers
- 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Self-Supervised Depth Completion Guided by 3D Perception and Geometry
Consistency [17.68427514090938]
This paper explores the utilization of 3D perceptual features and multi-view geometry consistency to devise a high-precision self-supervised depth completion method.
Experiments on benchmark datasets of NYU-Depthv2 and VOID demonstrate that the proposed model achieves the state-of-the-art depth completion performance.
arXiv Detail & Related papers (2023-12-23T14:19:56Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth
Approach with Saddle-shaped Depth Cells [23.345139129458122]
We show that different depth geometries have significant performance gaps, even using the same depth prediction error.
We introduce an ideal depth geometry composed of Saddle-Shaped Cells, whose predicted depth map oscillates upward and downward around the ground-truth surface.
Our method also points to a new research direction for considering depth geometry in MVS.
arXiv Detail & Related papers (2023-07-18T11:37:53Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - 3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method.
Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions.
We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.