Seeing Beyond the Patch: Scale-Adaptive Semantic Segmentation of
High-resolution Remote Sensing Imagery based on Reinforcement Learning
- URL: http://arxiv.org/abs/2309.15372v1
- Date: Wed, 27 Sep 2023 02:48:04 GMT
- Title: Seeing Beyond the Patch: Scale-Adaptive Semantic Segmentation of
High-resolution Remote Sensing Imagery based on Reinforcement Learning
- Authors: Yinhe Liu, Sunan Shi, Junjue Wang, Yanfei Zhong
- Abstract summary: We propose a dynamic scale perception framework, named GeoAgent, which adaptively captures appropriate scale context information outside the image patch.
A feature indexing module is proposed to enhance the ability of the agent to distinguish the current image patch's location.
The experimental results, using two publicly available datasets and our newly constructed dataset WUSU, demonstrate that GeoAgent outperforms previous segmentation methods.
- Score: 8.124633573706763
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In remote sensing imagery analysis, patch-based methods have limitations in
capturing information beyond the sliding window. This shortcoming poses a
significant challenge in processing complex and variable geo-objects, which
results in semantic inconsistency in segmentation results. To address this
challenge, we propose a dynamic scale perception framework, named GeoAgent,
which adaptively captures appropriate scale context information outside the
image patch based on the different geo-objects. In GeoAgent, each image patch's
states are represented by a global thumbnail and a location mask. The global
thumbnail provides context beyond the patch, and the location mask guides the
perceived spatial relationships. The scale-selection actions are performed
through a Scale Control Agent (SCA). A feature indexing module is proposed to
enhance the ability of the agent to distinguish the current image patch's
location. The action switches the patch scale and context branch of a
dual-branch segmentation network that extracts and fuses the features of
multi-scale patches. The GeoAgent adjusts the network parameters to perform the
appropriate scale-selection action based on the reward received for the
selected scale. The experimental results, using two publicly available datasets
and our newly constructed dataset WUSU, demonstrate that GeoAgent outperforms
previous segmentation methods, particularly for large-scale mapping
applications.
Related papers
- AgMTR: Agent Mining Transformer for Few-shot Segmentation in Remote Sensing [12.91626624625134]
Few-shot (FSS) aims to segment the interested objects in the query image with just a handful of labeled samples (i.e., support images)
Previous schemes would leverage the similarity between support-Query pixel pairs to construct the pixel-level semantic correlation.
In remote sensing scenarios with extreme intra-class variations and cluttered backgrounds, such pixel-level correlations may produce tremendous mismatches.
We propose a novel Agent Mining Transformer (AgMTR), which adaptively mines a set of local-aware agents to construct agent-level semantic correlation.
arXiv Detail & Related papers (2024-09-26T01:12:01Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - Adaptive Graph Convolution Module for Salient Object Detection [7.278033100480174]
We propose an adaptive graph convolution module (AGCM) to deal with complex scenes.
Prototype features are extracted from the input image using a learnable region generation layer.
The proposed AGCM dramatically improves the SOD performance both quantitatively and quantitatively.
arXiv Detail & Related papers (2023-03-17T07:07:17Z) - Cross-view Geo-localization via Learning Disentangled Geometric Layout
Correspondence [11.823147814005411]
Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database.
Recent works achieve outstanding progress on cross-view geo-localization benchmarks.
However, existing methods still suffer from poor performance on the cross-area benchmarks.
arXiv Detail & Related papers (2022-12-08T04:54:01Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Learning Hierarchical Graph Representation for Image Manipulation
Detection [50.04902159383709]
The objective of image manipulation detection is to identify and locate the manipulated regions in the images.
Recent approaches mostly adopt the sophisticated Convolutional Neural Networks (CNNs) to capture the tampering artifacts left in the images.
We propose a hierarchical Graph Convolutional Network (HGCN-Net), which consists of two parallel branches.
arXiv Detail & Related papers (2022-01-15T01:54:25Z) - Augmenting Convolutional networks with attention-based aggregation [55.97184767391253]
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.
We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth)
It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption.
arXiv Detail & Related papers (2021-12-27T14:05:41Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - Semantic Attention and Scale Complementary Network for Instance
Segmentation in Remote Sensing Images [54.08240004593062]
We propose an end-to-end multi-category instance segmentation model, which consists of a Semantic Attention (SEA) module and a Scale Complementary Mask Branch (SCMB)
SEA module contains a simple fully convolutional semantic segmentation branch with extra supervision to strengthen the activation of interest instances on the feature map.
SCMB extends the original single mask branch to trident mask branches and introduces complementary mask supervision at different scales.
arXiv Detail & Related papers (2021-07-25T08:53:59Z) - Hierarchical Attention Fusion for Geo-Localization [7.544917072241684]
We introduce a hierarchical attention fusion network using multi-scale features for geo-localization.
We extract the hierarchical feature maps from a convolutional neural network (CNN) and organically fuse the extracted features for image representations.
Our training is self-supervised using adaptive weights to control the attention of feature emphasis from each hierarchical level.
arXiv Detail & Related papers (2021-02-18T07:07:03Z) - Spatial Attention Pyramid Network for Unsupervised Domain Adaptation [66.75008386980869]
Unsupervised domain adaptation is critical in various computer vision tasks.
We design a new spatial attention pyramid network for unsupervised domain adaptation.
Our method performs favorably against the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-03-29T09:03:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.