HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation
- URL: http://arxiv.org/abs/2012.07356v1
- Date: Mon, 14 Dec 2020 09:15:15 GMT
- Title: HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation
- Authors: Xiaoyang Lyu, Liang Liu, Mengmeng Wang, Xin Kong, Lina Liu, Yong Liu,
Xinxin Chen, Yi Yuan
- Abstract summary: We present an improvedDepthNet, HR-Depth, with two effective strategies.
Using Resnet-18 as the encoder, HR-Depth surpasses all pre-vious state-of-the-art(SoTA) methods with the least param-eters at both high and low resolution.
- Score: 14.81943833870932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning shows great potential in monoculardepth estimation,
using image sequences as the only source ofsupervision. Although people try to
use the high-resolutionimage for depth estimation, the accuracy of prediction
hasnot been significantly improved. In this work, we find thecore reason comes
from the inaccurate depth estimation inlarge gradient regions, making the
bilinear interpolation er-ror gradually disappear as the resolution increases.
To obtainmore accurate depth estimation in large gradient regions, itis
necessary to obtain high-resolution features with spatialand semantic
information. Therefore, we present an improvedDepthNet, HR-Depth, with two
effective strategies: (1) re-design the skip-connection in DepthNet to get
better high-resolution features and (2) propose feature fusion
Squeeze-and-Excitation(fSE) module to fuse feature more efficiently.Using
Resnet-18 as the encoder, HR-Depth surpasses all pre-vious
state-of-the-art(SoTA) methods with the least param-eters at both high and low
resolution. Moreover, previousstate-of-the-art methods are based on fairly
complex and deepnetworks with a mass of parameters which limits their
realapplications. Thus we also construct a lightweight networkwhich uses
MobileNetV3 as encoder. Experiments show thatthe lightweight network can
perform on par with many largemodels like Monodepth2 at high-resolution with
only20%parameters. All codes and models will be available at
https://github.com/shawLyu/HR-Depth.
Related papers
- DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
In this paper, we present DepthSplat to connect Gaussian splatting and depth estimation.
We first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features.
We also show that Gaussian splatting can serve as an unsupervised pre-training objective.
arXiv Detail & Related papers (2024-10-17T17:59:58Z) - Depth Estimation From Monocular Images With Enhanced Encoder-Decoder Architecture [0.0]
This paper introduces a novel deep learning-based approach using an encoder-decoder architecture.
The Inception-ResNet-v2 model is utilized as the encoder.
Experimental results on the NYU Depth V2 dataset show that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-10-15T13:46:19Z) - NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth
Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning.
We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision.
The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z) - Deep Neighbor Layer Aggregation for Lightweight Self-Supervised
Monocular Depth Estimation [1.6775954077761863]
We present a fully convolutional depth estimation network using contextual feature fusion.
Compared to UNet++ and HRNet, we use high-resolution and low-resolution features to reserve information on small targets and fast-moving objects.
Our method reduces the parameters without sacrificing accuracy.
arXiv Detail & Related papers (2023-09-17T13:40:15Z) - RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation [27.679479140943503]
We propose a resolution adaptive self-supervised monocular depth estimation method (RA-Depth) by learning the scale invariance of the scene depth.
RA-Depth achieves state-of-the-art performance, and also exhibits a good ability of resolution adaptation.
arXiv Detail & Related papers (2022-07-25T08:49:59Z) - Global-Local Path Networks for Monocular Depth Estimation with Vertical
CutDepth [24.897377434844266]
We propose a novel structure and training strategy for monocular depth estimation.
We deploy a hierarchical transformer encoder to capture and convey the global context, and design a lightweight yet powerful decoder.
Our network achieves state-of-the-art performance over the challenging depth dataset NYU Depth V2.
arXiv Detail & Related papers (2022-01-19T06:37:21Z) - Monocular Depth Estimation Primed by Salient Point Detection and
Normalized Hessian Loss [43.950140695759764]
We propose an accurate and lightweight framework for monocular depth estimation based on a self-attention mechanism stemming from salient point detection.
We introduce a normalized Hessian loss term invariant to scaling and shear along the depth direction, which is shown to substantially improve the accuracy.
The proposed method achieves state-of-the-art results on NYU-Depth-v2 and KITTI while using 3.1-38.4 times smaller model in terms of the number of parameters than baseline approaches.
arXiv Detail & Related papers (2021-08-25T07:51:09Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - Interpretable Detail-Fidelity Attention Network for Single Image
Super-Resolution [89.1947690981471]
We propose a purposeful and interpretable detail-fidelity attention network to progressively process smoothes and details in divide-and-conquer manner.
Particularly, we propose a Hessian filtering for interpretable feature representation which is high-profile for detail inference.
Experiments demonstrate that the proposed methods achieve superior performances over the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-28T08:31:23Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - A Single Stream Network for Robust and Real-time RGB-D Salient Object
Detection [89.88222217065858]
We design a single stream network to use the depth map to guide early fusion and middle fusion between RGB and depth.
This model is 55.5% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 times 384$ image.
arXiv Detail & Related papers (2020-07-14T04:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.