AgMTR: Agent Mining Transformer for Few-shot Segmentation in Remote Sensing
- URL: http://arxiv.org/abs/2409.17453v1
- Date: Thu, 26 Sep 2024 01:12:01 GMT
- Title: AgMTR: Agent Mining Transformer for Few-shot Segmentation in Remote Sensing
- Authors: Hanbo Bi, Yingchao Feng, Yongqiang Mao, Jianning Pei, Wenhui Diao, Hongqi Wang, Xian Sun,
- Abstract summary: Few-shot (FSS) aims to segment the interested objects in the query image with just a handful of labeled samples (i.e., support images)
Previous schemes would leverage the similarity between support-Query pixel pairs to construct the pixel-level semantic correlation.
In remote sensing scenarios with extreme intra-class variations and cluttered backgrounds, such pixel-level correlations may produce tremendous mismatches.
We propose a novel Agent Mining Transformer (AgMTR), which adaptively mines a set of local-aware agents to construct agent-level semantic correlation.
- Score: 12.91626624625134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot Segmentation (FSS) aims to segment the interested objects in the query image with just a handful of labeled samples (i.e., support images). Previous schemes would leverage the similarity between support-query pixel pairs to construct the pixel-level semantic correlation. However, in remote sensing scenarios with extreme intra-class variations and cluttered backgrounds, such pixel-level correlations may produce tremendous mismatches, resulting in semantic ambiguity between the query foreground (FG) and background (BG) pixels. To tackle this problem, we propose a novel Agent Mining Transformer (AgMTR), which adaptively mines a set of local-aware agents to construct agent-level semantic correlation. Compared with pixel-level semantics, the given agents are equipped with local-contextual information and possess a broader receptive field. At this point, different query pixels can selectively aggregate the fine-grained local semantics of different agents, thereby enhancing the semantic clarity between query FG and BG pixels. Concretely, the Agent Learning Encoder (ALE) is first proposed to erect the optimal transport plan that arranges different agents to aggregate support semantics under different local regions. Then, for further optimizing the agents, the Agent Aggregation Decoder (AAD) and the Semantic Alignment Decoder (SAD) are constructed to break through the limited support set for mining valuable class-specific semantics from unlabeled data sources and the query image itself, respectively. Extensive experiments on the remote sensing benchmark iSAID indicate that the proposed method achieves state-of-the-art performance. Surprisingly, our method remains quite competitive when extended to more common natural scenarios, i.e., PASCAL-5i and COCO-20i.
Related papers
- Seeing Beyond the Patch: Scale-Adaptive Semantic Segmentation of
High-resolution Remote Sensing Imagery based on Reinforcement Learning [8.124633573706763]
We propose a dynamic scale perception framework, named GeoAgent, which adaptively captures appropriate scale context information outside the image patch.
A feature indexing module is proposed to enhance the ability of the agent to distinguish the current image patch's location.
The experimental results, using two publicly available datasets and our newly constructed dataset WUSU, demonstrate that GeoAgent outperforms previous segmentation methods.
arXiv Detail & Related papers (2023-09-27T02:48:04Z) - I2F: A Unified Image-to-Feature Approach for Domain Adaptive Semantic
Segmentation [55.633859439375044]
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work.
Key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly.
This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation.
arXiv Detail & Related papers (2023-01-03T15:19:48Z) - Unsupervised Domain Adaptation for Semantic Segmentation using One-shot
Image-to-Image Translation via Latent Representation Mixing [9.118706387430883]
We propose a new unsupervised domain adaptation method for the semantic segmentation of very high resolution images.
An image-to-image translation paradigm is proposed, based on an encoder-decoder principle where latent content representations are mixed across domains.
Cross-city comparative experiments have shown that the proposed method outperforms state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2022-12-07T18:16:17Z) - Framework-agnostic Semantically-aware Global Reasoning for Segmentation [29.69187816377079]
We propose a component that learns to project image features into latent representations and reason between them.
Our design encourages the latent regions to represent semantic concepts by ensuring that the activated regions are spatially disjoint.
Our latent tokens are semantically interpretable and diverse and provide a rich set of features that can be transferred to downstream tasks.
arXiv Detail & Related papers (2022-12-06T21:42:05Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - AFAN: Augmented Feature Alignment Network for Cross-Domain Object
Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications.
We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training.
Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z) - Semantic Distribution-aware Contrastive Adaptation for Semantic
Segmentation [50.621269117524925]
Domain adaptive semantic segmentation refers to making predictions on a certain target domain with only annotations of a specific source domain.
We present a semantic distribution-aware contrastive adaptation algorithm that enables pixel-wise representation alignment.
We evaluate SDCA on multiple benchmarks, achieving considerable improvements over existing algorithms.
arXiv Detail & Related papers (2021-05-11T13:21:25Z) - Pixel-Level Cycle Association: A New Perspective for Domain Adaptive
Semantic Segmentation [169.82760468633236]
We propose to build the pixel-level cycle association between source and target pixel pairs.
Our method can be trained end-to-end in one stage and introduces no additional parameters.
arXiv Detail & Related papers (2020-10-31T00:11:36Z) - Super-Resolution Domain Adaptation Networks for Semantic Segmentation
via Pixel and Output Level Aligning [4.500622871756055]
This paper designs a novel end-to-end semantic segmentation network, namely Super-Resolution Domain Adaptation Network (SRDA-Net)
SRDA-Net can simultaneously achieve the super-resolution task and the domain adaptation task, thus satisfying the requirement of semantic segmentation for remote sensing images.
Experimental results on two remote sensing datasets with different resolutions demonstrate that SRDA-Net performs favorably against some state-of-the-art methods.
arXiv Detail & Related papers (2020-05-13T15:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.