Feature Pyramid Network with Multi-Head Attention for Se-mantic
Segmentation of Fine-Resolution Remotely Sensed Im-ages
- URL: http://arxiv.org/abs/2102.07997v1
- Date: Tue, 16 Feb 2021 07:54:19 GMT
- Title: Feature Pyramid Network with Multi-Head Attention for Se-mantic
Segmentation of Fine-Resolution Remotely Sensed Im-ages
- Authors: Rui Li, Shunyi Zheng, Chenxi Duan
- Abstract summary: We introduce the Feature Pyramid Net-work (FPN) to bridge the gap between the low-level and high-level features.
We propose the Feature Pyramid Network with Multi-Head Attention (FPN-MHA) for semantic segmentation of fine-resolution remotely sensed images.
- Score: 4.869987958751064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic segmentation from fine-resolution remotely sensed images is an
urgent issue in satellite imagery processing. Due to the complicated
environment, automatic categorization and segmen-tation is a challenging matter
especially for images with a fine resolution. Solving it can help to surmount a
wide varied range of obstacles in urban planning, environmental protection, and
natural landscape monitoring, which paves the way for complete scene
understanding. However, the existing frequently-used encoder-decoder structure
is unable to effectively combine the extracted spatial and contextual features.
Therefore, in this paper, we introduce the Feature Pyramid Net-work (FPN) to
bridge the gap between the low-level and high-level features. Moreover, we
enhance the contextual information with the elaborate Multi-Head Attention
module and propose the Feature Pyramid Network with Multi-Head Attention
(FPN-MHA) for semantic segmentation of fine-resolution remotely sensed images.
Extensive experiments conducted on the ISPRS Potsdam and Vaihingen datasets
demonstrate the effectiveness of our FPN-MHA. Code is available at
https://github.com/lironui/FPN-MHA.
Related papers
- Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Hi-ResNet: Edge Detail Enhancement for High-Resolution Remote Sensing Segmentation [10.919956120261539]
High-resolution remote sensing (HRS) semantic segmentation extracts key objects from high-resolution coverage areas.
objects of the same category within HRS images show significant differences in scale and shape across diverse geographical environments.
We propose a High-resolution remote sensing network (Hi-ResNet) with efficient network structure designs.
arXiv Detail & Related papers (2023-05-22T03:58:25Z) - Decoupled-and-Coupled Networks: Self-Supervised Hyperspectral Image
Super-Resolution with Subpixel Fusion [67.35540259040806]
We propose a subpixel-level HS super-resolution framework by devising a novel decoupled-and-coupled network, called DCNet.
As the name suggests, DC-Net first decouples the input into common (or cross-sensor) and sensor-specific components.
We append a self-supervised learning module behind the CSU net by guaranteeing the material consistency to enhance the detailed appearances of the restored HS product.
arXiv Detail & Related papers (2022-05-07T23:40:36Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - Transformer Meets Convolution: A Bilateral Awareness Net-work for
Semantic Segmentation of Very Fine Resolution Ur-ban Scene Images [6.460167724233707]
We propose a bilateral awareness network (BANet) which contains a dependency path and a texture path.
BANet captures the long-range relationships and fine-grained details in VFR images.
Experiments conducted on the three large-scale urban scene image segmentation datasets, i.e., ISPRS Vaihingen dataset, ISPRS Potsdam dataset, and UAVid dataset, demonstrate the effective-ness of BANet.
arXiv Detail & Related papers (2021-06-23T13:57:36Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - Dense Attention Fluid Network for Salient Object Detection in Optical
Remote Sensing Images [193.77450545067967]
We propose an end-to-end Dense Attention Fluid Network (DAFNet) for salient object detection in optical remote sensing images (RSIs)
A Global Context-aware Attention (GCA) module is proposed to adaptively capture long-range semantic context relationships.
We construct a new and challenging optical RSI dataset for SOD that contains 2,000 images with pixel-wise saliency annotations.
arXiv Detail & Related papers (2020-11-26T06:14:10Z) - Multi-layer Feature Aggregation for Deep Scene Parsing Models [19.198074549944568]
In this paper, we explore the effective use of multi-layer feature outputs of the deep parsing networks for spatial-semantic consistency.
The proposed module can auto-select the intermediate visual features to correlate the spatial and semantic information.
Experiments on four public scene parsing datasets prove that the deep parsing network equipped with the proposed feature aggregation module can achieve very promising results.
arXiv Detail & Related papers (2020-11-04T23:07:07Z) - Spatial--spectral FFPNet: Attention-Based Pyramid Network for
Segmentation and Classification of Remote Sensing Images [12.320585790097415]
In this study, we develop an attention-based pyramid network for segmentation and classification of remote sensing datasets.
Experiments conducted on ISPRS Vaihingen and ISPRS Potsdam high-resolution datasets demonstrate the competitive segmentation accuracy achieved by the proposed heavy-weight spatial FFPNet.
arXiv Detail & Related papers (2020-08-20T04:55:34Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.