Related papers: Toward Hierarchical Self-Supervised Monocular Absolute Depth Estimation for Autonomous Driving Applications

Toward Hierarchical Self-Supervised Monocular Absolute Depth Estimation for Autonomous Driving Applications

URL: http://arxiv.org/abs/2004.05560v2
Date: Wed, 9 Sep 2020 10:48:55 GMT
Title: Toward Hierarchical Self-Supervised Monocular Absolute Depth Estimation for Autonomous Driving Applications
Authors: Feng Xue, Guirong Zhuo, Ziyuan Huang, Wufei Fu, Zhuoyue Wu, Marcelo H. Ang Jr
Abstract summary: Current methods still suffer from imprecise object-level depth inference and uncertain scale factor. We propose to address these two problems together by introducing DNet. Our contributions are twofold: a) a novel dense connected prediction layer is proposed to provide better object-level depth estimation and b) specifically for autonomous driving scenarios, dense geometrical constrains (DGC) is introduced so that precise scale factor can be recovered without additional cost for autonomous vehicles.
Score: 12.931635568843381
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, self-supervised methods for monocular depth estimation has rapidly become an significant branch of depth estimation task, especially for autonomous driving applications. Despite the high overall precision achieved, current methods still suffer from a) imprecise object-level depth inference and b) uncertain scale factor. The former problem would cause texture copy or provide inaccurate object boundary, and the latter would require current methods to have an additional sensor like LiDAR to provide depth ground-truth or stereo camera as additional training inputs, which makes them difficult to implement. In this work, we propose to address these two problems together by introducing DNet. Our contributions are twofold: a) a novel dense connected prediction (DCP) layer is proposed to provide better object-level depth estimation and b) specifically for autonomous driving scenarios, dense geometrical constrains (DGC) is introduced so that precise scale factor can be recovered without additional cost for autonomous vehicles. Extensive experiments have been conducted and, both DCP layer and DGC module are proved to be effectively solving the aforementioned problems respectively. Thanks to DCP layer, object boundary can now be better distinguished in the depth map and the depth is more continues on object level. It is also demonstrated that the performance of using DGC to perform scale recovery is comparable to that using ground-truth information, when the camera height is given and the ground point takes up more than 1.03\% of the pixels. Code is available at https://github.com/TJ-IPLab/DNet.

Related papers

Deep Neural Networks for Accurate Depth Estimation with Latent Space Features [0.0]
This study introduces a novel depth estimation framework that leverages latent space features within a deep convolutional neural network. The proposed model features dual encoder-decoder architecture, enabling both color-to-depth and depth-to-depth transformations. The framework is thoroughly tested using the NYU Depth V2 dataset, where it sets a new benchmark.
arXiv Detail & Related papers (2025-02-17T13:11:35Z)
Self-supervised Monocular Depth Estimation with Large Kernel Attention [30.44895226042849]
We propose a self-supervised monocular depth estimation network to get finer details. Specifically, we propose a decoder based on large kernel attention, which can model long-distance dependencies. Our method achieves competitive results on the KITTI dataset.
arXiv Detail & Related papers (2024-09-26T14:44:41Z)
OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection [102.0744303467713]
We propose a new multi-view 3D object detector named OPEN. Our main idea is to effectively inject object-wise depth information into the network through our proposed object-wise position embedding. OPEN achieves a new state-of-the-art performance with 64.4% NDS and 56.7% mAP on the nuScenes test benchmark.
arXiv Detail & Related papers (2024-07-15T14:29:15Z)
GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a Gradient-Aware Mask and Semantic Constraints [12.426365333096264]
We propose GAM-Depth, developed upon two novel components: gradient-aware mask and semantic constraints. The gradient-aware mask enables adaptive and robust supervision for both key areas and textureless regions. The incorporation of semantic constraints for indoor self-supervised depth estimation improves depth discrepancies at object boundaries.
arXiv Detail & Related papers (2024-02-22T07:53:34Z)
Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and Calibration [20.82054596017465]
Recent camera-based 3D object detection is limited by the precision of transforming from image to 3D feature spaces. This paper aims to address such a fundamental problem of camera-based 3D object detection: How to effectively learn depth information for accurate feature lifting and object localization.
arXiv Detail & Related papers (2024-02-07T14:21:26Z)
SwinDepth: Unsupervised Depth Estimation using Monocular Sequences via Swin Transformer and Densely Cascaded Network [29.798579906253696]
It is challenging to acquire dense ground truth depth labels for supervised training, and the unsupervised depth estimation using monocular sequences emerges as a promising alternative. In this paper, we employ a convolution-free Swin Transformer as an image feature extractor so that the network can capture both local geometric features and global semantic features for depth estimation. Also, we propose a Densely Cascaded Multi-scale Network (DCMNet) that connects every feature map directly with another from different scales via a top-down cascade pathway.
arXiv Detail & Related papers (2023-01-17T06:01:46Z)
Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD) Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks. Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z)
Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars. In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors. We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z)
PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net. Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z)
PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space. We propose a model that unifies these two tasks in the same metric space. Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z)
Balanced Depth Completion between Dense Depth Inference and Sparse Range Measurements via KISS-GP [14.158132769768578]
Estimating a dense and accurate depth map is the key requirement for autonomous driving and robotics. Recent advances in deep learning have allowed depth estimation in full resolution from a single image. Despite this impressive result, many deep-learning-based monocular depth estimation algorithms have failed to keep their accuracy yielding a meter-level estimation error.
arXiv Detail & Related papers (2020-08-12T08:07:55Z)
Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video. Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.