HQDec: Self-Supervised Monocular Depth Estimation Based on a
High-Quality Decoder
- URL: http://arxiv.org/abs/2305.18706v1
- Date: Tue, 30 May 2023 03:03:11 GMT
- Title: HQDec: Self-Supervised Monocular Depth Estimation Based on a
High-Quality Decoder
- Authors: Fei Wang, Jun Cheng
- Abstract summary: We propose a high-quality decoder (HQDec) to recover scene depths.
The code and models will be publicly available at hrefhttps://github.com/fwucas/HQDecHQDec.
- Score: 14.67433946077953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decoders play significant roles in recovering scene depths. However, the
decoders used in previous works ignore the propagation of multilevel lossless
fine-grained information, cannot adaptively capture local and global
information in parallel, and cannot perform sufficient global statistical
analyses on the final output disparities. In addition, the process of mapping
from a low-resolution feature space to a high-resolution feature space is a
one-to-many problem that may have multiple solutions. Therefore, the quality of
the recovered depth map is low. To this end, we propose a high-quality decoder
(HQDec), with which multilevel near-lossless fine-grained information, obtained
by the proposed adaptive axial-normalized position-embedded channel attention
sampling module (AdaAxialNPCAS), can be adaptively incorporated into a
low-resolution feature map with high-level semantics utilizing the proposed
adaptive information exchange scheme. In the HQDec, we leverage the proposed
adaptive refinement module (AdaRM) to model the local and global dependencies
between pixels in parallel and utilize the proposed disparity attention module
to model the distribution characteristics of disparity values from a global
perspective. To recover fine-grained high-resolution features with maximal
accuracy, we adaptively fuse the high-frequency information obtained by
constraining the upsampled solution space utilizing the local and global
dependencies between pixels into the high-resolution feature map generated from
the nonlearning method. Extensive experiments demonstrate that each proposed
component improves the quality of the depth estimation results over the
baseline results, and the developed approach achieves state-of-the-art results
on the KITTI and DDAD datasets. The code and models will be publicly available
at \href{https://github.com/fwucas/HQDec}{HQDec}.
Related papers
- Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation [31.970739018426645]
In practical applications of human pose estimation, low-resolution inputs frequently occur, and existing state-of-the-art models perform poorly with low-resolution images.
This work focuses on boosting the performance of low-resolution models by distilling knowledge from a high-resolution model.
arXiv Detail & Related papers (2024-05-19T04:57:17Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Pyramid Grafting Network for One-Stage High Resolution Saliency
Detection [29.013012579688347]
We propose a one-stage framework called Pyramid Grafting Network (PGNet) to extract features from different resolution images independently.
An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically.
We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-8K resolutions.
arXiv Detail & Related papers (2022-04-11T12:22:21Z) - Single Image Internal Distribution Measurement Using Non-Local
Variational Autoencoder [11.985083962982909]
This paper proposes a novel image-specific solution, namely non-local variational autoencoder (textttNLVAE)
textttNLVAE is introduced as a self-supervised strategy that reconstructs high-resolution images using disentangled information from the non-local neighbourhood.
Experimental results from seven benchmark datasets demonstrate the effectiveness of the textttNLVAE model.
arXiv Detail & Related papers (2022-04-02T18:43:55Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - High Dimensional Level Set Estimation with Bayesian Neural Network [58.684954492439424]
This paper proposes novel methods to solve the high dimensional Level Set Estimation problems using Bayesian Neural Networks.
For each problem, we derive the corresponding theoretic information based acquisition function to sample the data points.
Numerical experiments on both synthetic and real-world datasets show that our proposed method can achieve better results compared to existing state-of-the-art approaches.
arXiv Detail & Related papers (2020-12-17T23:21:53Z) - AdaBins: Depth Estimation using Adaptive Bins [43.07310038858445]
We propose a transformer-based architecture block that divides the depth range into bins whose center value is estimated adaptively per image.
Our results show a decisive improvement over the state-of-the-art on several popular depth datasets.
arXiv Detail & Related papers (2020-11-28T14:40:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.