Pyramid Frequency Network with Spatial Attention Residual Refinement
Module for Monocular Depth Estimation
- URL: http://arxiv.org/abs/2204.02386v1
- Date: Tue, 5 Apr 2022 17:48:26 GMT
- Title: Pyramid Frequency Network with Spatial Attention Residual Refinement
Module for Monocular Depth Estimation
- Authors: Zhengyang Lu and Ying Chen
- Abstract summary: Deep-learning approaches to depth estimation are rapidly advancing, offering superior performance over existing methods.
In this work, a Pyramid Frequency Network with Spatial Attention Residual Refinement Module is proposed to deal with the weak robustness of existing deep-learning methods.
PFN achieves better visual accuracy than state-of-the-art methods in both indoor and outdoor scenes on Make3D, KITTI depth, and NYUv2 datasets.
- Score: 4.397981844057195
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep-learning-based approaches to depth estimation are rapidly advancing,
offering superior performance over existing methods. To estimate the depth in
real-world scenarios, depth estimation models require the robustness of various
noise environments. In this work, a Pyramid Frequency Network(PFN) with Spatial
Attention Residual Refinement Module(SARRM) is proposed to deal with the weak
robustness of existing deep-learning methods. To reconstruct depth maps with
accurate details, the SARRM constructs a residual fusion method with an
attention mechanism to refine the blur depth. The frequency division strategy
is designed, and the frequency pyramid network is developed to extract features
from multiple frequency bands. With the frequency strategy, PFN achieves better
visual accuracy than state-of-the-art methods in both indoor and outdoor scenes
on Make3D, KITTI depth, and NYUv2 datasets. Additional experiments on the noisy
NYUv2 dataset demonstrate that PFN is more reliable than existing deep-learning
methods in high-noise scenes.
Related papers
- Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework.
We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas.
With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z) - Struct-MDC: Mesh-Refined Unsupervised Depth Completion Leveraging
Structural Regularities from Visual SLAM [1.8899300124593648]
Feature-based visual simultaneous localization and mapping (SLAM) methods only estimate the depth of extracted features.
depth completion tasks that estimate a dense depth from a sparse depth have gained significant importance in robotic applications like exploration.
We propose a mesh depth refinement (MDR) module to address this problem.
The Struct-MDC outperforms other state-of-the-art algorithms on public and our custom datasets.
arXiv Detail & Related papers (2022-04-29T04:29:17Z) - Joint Learning of Salient Object Detection, Depth Estimation and Contour
Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD)
Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks.
Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z) - Depth-Cooperated Trimodal Network for Video Salient Object Detection [13.727763221832532]
We propose a depth-operated triOD network called DCTNet for video salient object detection (VS)
To this end, we first generate depth from RGB frames, and then propose an approach to treat the three modalities unequally.
We also introduce a refinement fusion module (RFM) to suppress noises in each modality and select useful information dynamically for further feature refinement.
arXiv Detail & Related papers (2022-02-12T13:04:16Z) - Non-local Recurrent Regularization Networks for Multi-view Stereo [108.17325696835542]
In deep multi-view stereo networks, cost regularization is crucial to achieve accurate depth estimation.
We propose a novel non-local recurrent regularization network for multi-view stereo, named NR2-Net.
Our method achieves state-of-the-art reconstruction results on both DTU and Tanks and Temples datasets.
arXiv Detail & Related papers (2021-10-13T01:43:54Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Channel Attention based Iterative Residual Learning for Depth Map
Super-Resolution [58.626803922196146]
We argue that DSR models trained on synthetic dataset are restrictive and not effective in dealing with real-world DSR tasks.
We make two contributions in tackling real-world degradation of different depth sensors.
We propose a new framework for real-world DSR, which consists of four modules.
arXiv Detail & Related papers (2020-06-02T09:12:23Z) - Guiding Monocular Depth Estimation Using Depth-Attention Volume [38.92495189498365]
We propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments.
Experiments on two popular indoor datasets, NYU-Depth-v2 and ScanNet, show that our method achieves state-of-the-art depth estimation results.
arXiv Detail & Related papers (2020-04-06T15:45:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.