Related papers: HQDec: Self-Supervised Monocular Depth Estimation Based on a High-Quality Decoder

HQDec: Self-Supervised Monocular Depth Estimation Based on a High-Quality Decoder

URL: http://arxiv.org/abs/2305.18706v1
Date: Tue, 30 May 2023 03:03:11 GMT
Title: HQDec: Self-Supervised Monocular Depth Estimation Based on a High-Quality Decoder
Authors: Fei Wang, Jun Cheng
Abstract summary: We propose a high-quality decoder (HQDec) to recover scene depths. The code and models will be publicly available at hrefhttps://github.com/fwucas/HQDecHQDec.
Score: 14.67433946077953
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decoders play significant roles in recovering scene depths. However, the decoders used in previous works ignore the propagation of multilevel lossless fine-grained information, cannot adaptively capture local and global information in parallel, and cannot perform sufficient global statistical analyses on the final output disparities. In addition, the process of mapping from a low-resolution feature space to a high-resolution feature space is a one-to-many problem that may have multiple solutions. Therefore, the quality of the recovered depth map is low. To this end, we propose a high-quality decoder (HQDec), with which multilevel near-lossless fine-grained information, obtained by the proposed adaptive axial-normalized position-embedded channel attention sampling module (AdaAxialNPCAS), can be adaptively incorporated into a low-resolution feature map with high-level semantics utilizing the proposed adaptive information exchange scheme. In the HQDec, we leverage the proposed adaptive refinement module (AdaRM) to model the local and global dependencies between pixels in parallel and utilize the proposed disparity attention module to model the distribution characteristics of disparity values from a global perspective. To recover fine-grained high-resolution features with maximal accuracy, we adaptively fuse the high-frequency information obtained by constraining the upsampled solution space utilizing the local and global dependencies between pixels into the high-resolution feature map generated from the nonlearning method. Extensive experiments demonstrate that each proposed component improves the quality of the depth estimation results over the baseline results, and the developed approach achieves state-of-the-art results on the KITTI and DDAD datasets. The code and models will be publicly available at \href{https://github.com/fwucas/HQDec}{HQDec}.

Related papers

RARE-UNet: Resolution-Aligned Routing Entry for Adaptive Medical Image Segmentation [0.0]
We propose a resolution-aware multi-scale segmentation architecture that adapts its inference path to the spatial resolution of the input.<n>RARE-UNet is tested on two benchmark brain imaging tasks for hippocampus and tumor segmentation.<n>Our model achieves the highest average Dice scores of 0.84 and 0.65 across resolution, while maintaining consistent performance and significantly reduced inference time at lower resolutions.
arXiv Detail & Related papers (2025-07-21T11:49:20Z)
JAFAR: Jack up Any Feature at Any Resolution [53.343826346140624]
JAFAR is a lightweight and flexible feature upsampler for Foundation Visions.<n>It enhances the spatial resolution of visual features from any Foundation Vision to an arbitrary target resolution.<n>It generalizes remarkably well to significantly higher output scales.
arXiv Detail & Related papers (2025-06-10T20:53:12Z)
Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution [55.9977636042469]
Bit-depth compression produces a uniform depth representation in regions with subtle variations, hindering the recovery of detailed information. densely distributed random noise reduces the accuracy of estimating the global geometric structure of the scene. We propose a novel framework, termed geometry-decoupled network (GDNet), for compressed depth map super-resolution.
arXiv Detail & Related papers (2024-11-05T16:37:30Z)
PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network [24.54269823691119]
We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives. To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD. All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets.
arXiv Detail & Related papers (2024-08-02T09:31:21Z)
Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation [31.970739018426645]
In practical applications of human pose estimation, low-resolution inputs frequently occur, and existing state-of-the-art models perform poorly with low-resolution images. This work focuses on boosting the performance of low-resolution models by distilling knowledge from a high-resolution model.
arXiv Detail & Related papers (2024-05-19T04:57:17Z)
Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement. Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet) Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z)
Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost. Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution. We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
Pyramid Grafting Network for One-Stage High Resolution Saliency Detection [29.013012579688347]
We propose a one-stage framework called Pyramid Grafting Network (PGNet) to extract features from different resolution images independently. An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically. We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-8K resolutions.
arXiv Detail & Related papers (2022-04-11T12:22:21Z)
DepthFormer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation. We propose to leverage the Transformer to model this global context with an effective attention mechanism. Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z)
High-resolution Depth Maps Imaging via Attention-based Hierarchical Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR. We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z)
AdaBins: Depth Estimation using Adaptive Bins [43.07310038858445]
We propose a transformer-based architecture block that divides the depth range into bins whose center value is estimated adaptively per image. Our results show a decisive improvement over the state-of-the-art on several popular depth datasets.
arXiv Detail & Related papers (2020-11-28T14:40:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.