HQDec: Self-Supervised Monocular Depth Estimation Based on a
High-Quality Decoder
- URL: http://arxiv.org/abs/2305.18706v1
- Date: Tue, 30 May 2023 03:03:11 GMT
- Title: HQDec: Self-Supervised Monocular Depth Estimation Based on a
High-Quality Decoder
- Authors: Fei Wang, Jun Cheng
- Abstract summary: We propose a high-quality decoder (HQDec) to recover scene depths.
The code and models will be publicly available at hrefhttps://github.com/fwucas/HQDecHQDec.
- Score: 14.67433946077953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decoders play significant roles in recovering scene depths. However, the
decoders used in previous works ignore the propagation of multilevel lossless
fine-grained information, cannot adaptively capture local and global
information in parallel, and cannot perform sufficient global statistical
analyses on the final output disparities. In addition, the process of mapping
from a low-resolution feature space to a high-resolution feature space is a
one-to-many problem that may have multiple solutions. Therefore, the quality of
the recovered depth map is low. To this end, we propose a high-quality decoder
(HQDec), with which multilevel near-lossless fine-grained information, obtained
by the proposed adaptive axial-normalized position-embedded channel attention
sampling module (AdaAxialNPCAS), can be adaptively incorporated into a
low-resolution feature map with high-level semantics utilizing the proposed
adaptive information exchange scheme. In the HQDec, we leverage the proposed
adaptive refinement module (AdaRM) to model the local and global dependencies
between pixels in parallel and utilize the proposed disparity attention module
to model the distribution characteristics of disparity values from a global
perspective. To recover fine-grained high-resolution features with maximal
accuracy, we adaptively fuse the high-frequency information obtained by
constraining the upsampled solution space utilizing the local and global
dependencies between pixels into the high-resolution feature map generated from
the nonlearning method. Extensive experiments demonstrate that each proposed
component improves the quality of the depth estimation results over the
baseline results, and the developed approach achieves state-of-the-art results
on the KITTI and DDAD datasets. The code and models will be publicly available
at \href{https://github.com/fwucas/HQDec}{HQDec}.
Related papers
- Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution [55.9977636042469]
Bit-depth compression produces a uniform depth representation in regions with subtle variations, hindering the recovery of detailed information.
densely distributed random noise reduces the accuracy of estimating the global geometric structure of the scene.
We propose a novel framework, termed geometry-decoupled network (GDNet), for compressed depth map super-resolution.
arXiv Detail & Related papers (2024-11-05T16:37:30Z) - PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network [24.54269823691119]
We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives.
To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD.
All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets.
arXiv Detail & Related papers (2024-08-02T09:31:21Z) - Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation [31.970739018426645]
In practical applications of human pose estimation, low-resolution inputs frequently occur, and existing state-of-the-art models perform poorly with low-resolution images.
This work focuses on boosting the performance of low-resolution models by distilling knowledge from a high-resolution model.
arXiv Detail & Related papers (2024-05-19T04:57:17Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Pyramid Grafting Network for One-Stage High Resolution Saliency
Detection [29.013012579688347]
We propose a one-stage framework called Pyramid Grafting Network (PGNet) to extract features from different resolution images independently.
An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically.
We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-8K resolutions.
arXiv Detail & Related papers (2022-04-11T12:22:21Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - AdaBins: Depth Estimation using Adaptive Bins [43.07310038858445]
We propose a transformer-based architecture block that divides the depth range into bins whose center value is estimated adaptively per image.
Our results show a decisive improvement over the state-of-the-art on several popular depth datasets.
arXiv Detail & Related papers (2020-11-28T14:40:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.