Toward Hierarchical Self-Supervised Monocular Absolute Depth Estimation
for Autonomous Driving Applications
- URL: http://arxiv.org/abs/2004.05560v2
- Date: Wed, 9 Sep 2020 10:48:55 GMT
- Title: Toward Hierarchical Self-Supervised Monocular Absolute Depth Estimation
for Autonomous Driving Applications
- Authors: Feng Xue, Guirong Zhuo, Ziyuan Huang, Wufei Fu, Zhuoyue Wu, Marcelo H.
Ang Jr
- Abstract summary: Current methods still suffer from imprecise object-level depth inference and uncertain scale factor.
We propose to address these two problems together by introducing DNet.
Our contributions are twofold: a) a novel dense connected prediction layer is proposed to provide better object-level depth estimation and b) specifically for autonomous driving scenarios, dense geometrical constrains (DGC) is introduced so that precise scale factor can be recovered without additional cost for autonomous vehicles.
- Score: 12.931635568843381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, self-supervised methods for monocular depth estimation has
rapidly become an significant branch of depth estimation task, especially for
autonomous driving applications. Despite the high overall precision achieved,
current methods still suffer from a) imprecise object-level depth inference and
b) uncertain scale factor. The former problem would cause texture copy or
provide inaccurate object boundary, and the latter would require current
methods to have an additional sensor like LiDAR to provide depth ground-truth
or stereo camera as additional training inputs, which makes them difficult to
implement. In this work, we propose to address these two problems together by
introducing DNet. Our contributions are twofold: a) a novel dense connected
prediction (DCP) layer is proposed to provide better object-level depth
estimation and b) specifically for autonomous driving scenarios, dense
geometrical constrains (DGC) is introduced so that precise scale factor can be
recovered without additional cost for autonomous vehicles. Extensive
experiments have been conducted and, both DCP layer and DGC module are proved
to be effectively solving the aforementioned problems respectively. Thanks to
DCP layer, object boundary can now be better distinguished in the depth map and
the depth is more continues on object level. It is also demonstrated that the
performance of using DGC to perform scale recovery is comparable to that using
ground-truth information, when the camera height is given and the ground point
takes up more than 1.03\% of the pixels. Code is available at
https://github.com/TJ-IPLab/DNet.
Related papers
- OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection [102.0744303467713]
We propose a new multi-view 3D object detector named OPEN.
Our main idea is to effectively inject object-wise depth information into the network through our proposed object-wise position embedding.
OPEN achieves a new state-of-the-art performance with 64.4% NDS and 56.7% mAP on the nuScenes test benchmark.
arXiv Detail & Related papers (2024-07-15T14:29:15Z) - GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a
Gradient-Aware Mask and Semantic Constraints [12.426365333096264]
We propose GAM-Depth, developed upon two novel components: gradient-aware mask and semantic constraints.
The gradient-aware mask enables adaptive and robust supervision for both key areas and textureless regions.
The incorporation of semantic constraints for indoor self-supervised depth estimation improves depth discrepancies at object boundaries.
arXiv Detail & Related papers (2024-02-22T07:53:34Z) - Toward Accurate Camera-based 3D Object Detection via Cascade Depth
Estimation and Calibration [20.82054596017465]
Recent camera-based 3D object detection is limited by the precision of transforming from image to 3D feature spaces.
This paper aims to address such a fundamental problem of camera-based 3D object detection: How to effectively learn depth information for accurate feature lifting and object localization.
arXiv Detail & Related papers (2024-02-07T14:21:26Z) - SwinDepth: Unsupervised Depth Estimation using Monocular Sequences via
Swin Transformer and Densely Cascaded Network [29.798579906253696]
It is challenging to acquire dense ground truth depth labels for supervised training, and the unsupervised depth estimation using monocular sequences emerges as a promising alternative.
In this paper, we employ a convolution-free Swin Transformer as an image feature extractor so that the network can capture both local geometric features and global semantic features for depth estimation.
Also, we propose a Densely Cascaded Multi-scale Network (DCMNet) that connects every feature map directly with another from different scales via a top-down cascade pathway.
arXiv Detail & Related papers (2023-01-17T06:01:46Z) - RealNet: Combining Optimized Object Detection with Information Fusion
Depth Estimation Co-Design Method on IoT [2.9275056713717285]
We propose a co-design method combining the model-streamlined recognition algorithm, the depth estimation algorithm, and information fusion.
The method proposed in this paper is suitable for mobile platforms with high real-time request.
arXiv Detail & Related papers (2022-04-24T08:35:55Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View
Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net.
Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Balanced Depth Completion between Dense Depth Inference and Sparse Range
Measurements via KISS-GP [14.158132769768578]
Estimating a dense and accurate depth map is the key requirement for autonomous driving and robotics.
Recent advances in deep learning have allowed depth estimation in full resolution from a single image.
Despite this impressive result, many deep-learning-based monocular depth estimation algorithms have failed to keep their accuracy yielding a meter-level estimation error.
arXiv Detail & Related papers (2020-08-12T08:07:55Z) - Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video.
Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.