Instance-aware Remote Sensing Image Captioning with Cross-hierarchy
Attention
- URL: http://arxiv.org/abs/2105.04996v1
- Date: Tue, 11 May 2021 12:59:07 GMT
- Title: Instance-aware Remote Sensing Image Captioning with Cross-hierarchy
Attention
- Authors: Chengze Wang, Zhiyu Jiang, Yuan Yuan
- Abstract summary: spatial attention is a straightforward approach to enhance the performance for remote sensing image captioning.
We propose a remote sensing image caption generator with instance-awareness and cross-hierarchy attention.
- Score: 11.23821696220285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The spatial attention is a straightforward approach to enhance the
performance for remote sensing image captioning. However, conventional spatial
attention approaches consider only the attention distribution on one fixed
coarse grid, resulting in the semantics of tiny objects can be easily ignored
or disturbed during the visual feature extraction. Worse still, the fixed
semantic level of conventional spatial attention limits the image understanding
in different levels and perspectives, which is critical for tackling the huge
diversity in remote sensing images. To address these issues, we propose a
remote sensing image caption generator with instance-awareness and
cross-hierarchy attention. 1) The instances awareness is achieved by
introducing a multi-level feature architecture that contains the visual
information of multi-level instance-possible regions and their surroundings. 2)
Moreover, based on this multi-level feature extraction, a cross-hierarchy
attention mechanism is proposed to prompt the decoder to dynamically focus on
different semantic hierarchies and instances at each time step. The
experimental results on public datasets demonstrate the superiority of proposed
approach over existing methods.
Related papers
- Spatial Structure Constraints for Weakly Supervised Semantic
Segmentation [100.0316479167605]
A class activation map (CAM) can only locate the most discriminative part of objects.
We propose spatial structure constraints (SSC) for weakly supervised semantic segmentation to alleviate the unwanted object over-activation of attention expansion.
Our approach achieves 72.7% and 47.0% mIoU on the PASCAL VOC 2012 and COCO datasets, respectively.
arXiv Detail & Related papers (2024-01-20T05:25:25Z) - HEAP: Unsupervised Object Discovery and Localization with Contrastive
Grouping [29.678756772610797]
Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision.
Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features.
To address these problems, we introduce Hierarchical mErging framework via contrAstive grouPing (HEAP)
arXiv Detail & Related papers (2023-12-29T06:46:37Z) - SACANet: scene-aware class attention network for semantic segmentation
of remote sensing images [4.124381172041927]
We propose a scene-aware class attention network (SACANet) for semantic segmentation of remote sensing images.
Experimental results on three datasets show that SACANet outperforms other state-of-the-art methods and validate its effectiveness.
arXiv Detail & Related papers (2023-04-22T14:54:31Z) - Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised
Semantic Segmentation and Localization [98.46318529630109]
We take inspiration from traditional spectral segmentation methods by reframing image decomposition as a graph partitioning problem.
We find that these eigenvectors already decompose an image into meaningful segments, and can be readily used to localize objects in a scene.
By clustering the features associated with these segments across a dataset, we can obtain well-delineated, nameable regions.
arXiv Detail & Related papers (2022-05-16T17:47:44Z) - Guiding Attention using Partial-Order Relationships for Image Captioning [2.620091916172863]
A guided attention network mechanism exploits the relationship between the visual scene and text-descriptions.
A pairwise ranking objective is used for training this embedding space which allows similar images, topics and captions in the shared semantic space.
The experimental results based on MSCOCO dataset shows the competitiveness of our approach.
arXiv Detail & Related papers (2022-04-15T14:22:09Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - Region-level Active Learning for Cluttered Scenes [60.93811392293329]
We introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach.
We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes.
arXiv Detail & Related papers (2021-08-20T14:02:38Z) - Spatially Consistent Representation Learning [12.120041613482558]
We propose a spatially consistent representation learning algorithm (SCRL) for multi-object and location-specific tasks.
We devise a novel self-supervised objective that tries to produce coherent spatial representations of a randomly cropped local region.
On various downstream localization tasks with benchmark datasets, the proposed SCRL shows significant performance improvements.
arXiv Detail & Related papers (2021-03-10T15:23:45Z) - Rethinking of the Image Salient Object Detection: Object-level Semantic
Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter [62.26677215668959]
We propose a lightweight, weakly supervised deep network to coarsely locate semantically salient regions.
We then fuse multiple off-the-shelf deep models on these semantically salient regions as the pixel-wise saliency refinement.
Our method is simple yet effective, which is the first attempt to consider the salient object detection mainly as an object-level semantic re-ranking problem.
arXiv Detail & Related papers (2020-08-10T07:12:43Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.