Pyramid Feature Attention Network for Monocular Depth Prediction
- URL: http://arxiv.org/abs/2403.01440v1
- Date: Sun, 3 Mar 2024 08:33:23 GMT
- Title: Pyramid Feature Attention Network for Monocular Depth Prediction
- Authors: Yifang Xu, Chenglei Peng, Ming Li, Yang Li, and Sidan Du
- Abstract summary: We propose a Pyramid Feature Attention Network (PFANet) to improve the high-level context features and low-level spatial features.
Our method outperforms state-of-the-art methods on the KITTI dataset.
- Score: 8.615717738037823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep convolutional neural networks (DCNNs) have achieved great success in
monocular depth estimation (MDE). However, few existing works take the
contributions for MDE of different levels feature maps into account, leading to
inaccurate spatial layout, ambiguous boundaries and discontinuous object
surface in the prediction. To better tackle these problems, we propose a
Pyramid Feature Attention Network (PFANet) to improve the high-level context
features and low-level spatial features. In the proposed PFANet, we design a
Dual-scale Channel Attention Module (DCAM) to employ channel attention in
different scales, which aggregate global context and local information from the
high-level feature maps. To exploit the spatial relationship of visual
features, we design a Spatial Pyramid Attention Module (SPAM) which can guide
the network attention to multi-scale detailed information in the low-level
feature maps. Finally, we introduce scale-invariant gradient loss to increase
the penalty on errors in depth-wise discontinuous regions. Experimental results
show that our method outperforms state-of-the-art methods on the KITTI dataset.
Related papers
- Multi-Scale Direction-Aware Network for Infrared Small Target Detection [2.661766509317245]
Infrared small target detection faces the problem that it is difficult to effectively separate the background and the target.
We propose a multi-scale direction-aware network (MSDA-Net) to integrate the high-frequency directional features of infrared small targets.
MSDA-Net achieves state-of-the-art (SOTA) results on the public NUDT-SIRST, SIRST and IRSTD-1k datasets.
arXiv Detail & Related papers (2024-06-04T07:23:09Z) - Mesh Denoising Transformer [104.5404564075393]
Mesh denoising is aimed at removing noise from input meshes while preserving their feature structures.
SurfaceFormer is a pioneering Transformer-based mesh denoising framework.
New representation known as Local Surface Descriptor captures local geometric intricacies.
Denoising Transformer module receives the multimodal information and achieves efficient global feature aggregation.
arXiv Detail & Related papers (2024-05-10T15:27:43Z) - DeepPointMap: Advancing LiDAR SLAM with Unified Neural Descriptors [17.664439455504592]
We propose a unified architecture, DeepPointMap, achieving excellent preference on both aspects.
We utilize neural network to extract highly representative and sparse neural descriptors from point clouds.
We showcase the versatility of our framework by extending it to more challenging multi-agent collaborative SLAM.
arXiv Detail & Related papers (2023-12-05T11:40:41Z) - Centralized Feature Pyramid for Object Detection [53.501796194901964]
Visual feature pyramid has shown its superiority in both effectiveness and efficiency in a wide range of applications.
In this paper, we propose a OLO Feature Pyramid for object detection, which is based on a globally explicit centralized feature regulation.
arXiv Detail & Related papers (2022-10-05T08:32:54Z) - Struct-MDC: Mesh-Refined Unsupervised Depth Completion Leveraging
Structural Regularities from Visual SLAM [1.8899300124593648]
Feature-based visual simultaneous localization and mapping (SLAM) methods only estimate the depth of extracted features.
depth completion tasks that estimate a dense depth from a sparse depth have gained significant importance in robotic applications like exploration.
We propose a mesh depth refinement (MDR) module to address this problem.
The Struct-MDC outperforms other state-of-the-art algorithms on public and our custom datasets.
arXiv Detail & Related papers (2022-04-29T04:29:17Z) - TC-Net: Triple Context Network for Automated Stroke Lesion Segmentation [0.5482532589225552]
We propose a new network, Triple Context Network (TC-Net), with the capture of spatial contextual information as the core.
Our network is evaluated on the open dataset ATLAS, achieving the highest score of 0.594, Hausdorff distance of 27.005 mm, and average symmetry surface distance of 7.137 mm.
arXiv Detail & Related papers (2022-02-28T11:12:16Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z) - Cross-layer Feature Pyramid Network for Salient Object Detection [102.20031050972429]
We propose a novel Cross-layer Feature Pyramid Network to improve the progressive fusion in salient object detection.
The distributed features per layer own both semantics and salient details from all other layers simultaneously, and suffer reduced loss of important information.
arXiv Detail & Related papers (2020-02-25T14:06:27Z) - Weakly Supervised Attention Pyramid Convolutional Neural Network for
Fine-Grained Visual Classification [71.96618723152487]
We introduce Attention Pyramid Convolutional Neural Network (AP-CNN)
AP-CNN learns both high-level semantic and low-level detailed feature representation.
It can be trained end-to-end, without the need of additional bounding box/part annotations.
arXiv Detail & Related papers (2020-02-09T12:33:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.