All the attention you need: Global-local, spatial-channel attention for
image retrieval
- URL: http://arxiv.org/abs/2107.08000v1
- Date: Fri, 16 Jul 2021 16:39:13 GMT
- Title: All the attention you need: Global-local, spatial-channel attention for
image retrieval
- Authors: Chull Hwan Song, Hye Joo Han, Yannis Avrithis
- Abstract summary: We address representation learning for large-scale instance-level image retrieval.
We present global-local attention module (GLAM), which is attached at the end of a backbone network.
We obtain a new feature tensor and, by spatial pooling, we learn a powerful embedding for image retrieval.
- Score: 11.150896867058902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address representation learning for large-scale instance-level image
retrieval. Apart from backbone, training pipelines and loss functions, popular
approaches have focused on different spatial pooling and attention mechanisms,
which are at the core of learning a powerful global image representation. There
are different forms of attention according to the interaction of elements of
the feature tensor (local and global) and the dimensions where it is applied
(spatial and channel). Unfortunately, each study addresses only one or two
forms of attention and applies it to different problems like classification,
detection or retrieval.
We present global-local attention module (GLAM), which is attached at the end
of a backbone network and incorporates all four forms of attention: local and
global, spatial and channel. We obtain a new feature tensor and, by spatial
pooling, we learn a powerful embedding for image retrieval. Focusing on global
descriptors, we provide empirical evidence of the interaction of all forms of
attention and improve the state of the art on standard benchmarks.
Related papers
- Local-Aware Global Attention Network for Person Re-Identification Based on Body and Hand Images [0.0]
We propose a compound approach for end-to-end discriminative deep feature learning for person Re-Id based on both body and hand images.
The proposed method consistently outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2022-09-11T09:43:42Z) - Dual Cross-Attention Learning for Fine-Grained Visual Categorization and
Object Re-Identification [19.957957963417414]
We propose a dual cross-attention learning (DCAL) algorithm to coordinate with self-attention learning.
First, we propose global-local cross-attention (GLCA) to enhance the interactions between global images and local high-response regions.
Second, we propose pair-wise cross-attention (PWCA) to establish the interactions between image pairs.
arXiv Detail & Related papers (2022-05-04T16:14:26Z) - L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly
Supervised Semantic Segmentation [67.26984058377435]
We present L2G, a simple online local-to-global knowledge transfer framework for high-quality object attention mining.
Our framework conducts the global network to learn the captured rich object detail knowledge from a global view.
Experiments show that our method attains 72.1% and 44.2% mIoU scores on the validation set of PASCAL VOC 2012 and MS COCO 2014.
arXiv Detail & Related papers (2022-04-07T04:31:32Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - Implicit and Explicit Attention for Zero-Shot Learning [11.66422653137002]
We propose implicit and explicit attention mechanisms to address the bias problem in Zero-Shot Learning (ZSL) models.
We conduct comprehensive experiments on three popular benchmarks: AWA2, CUB and SUN.
arXiv Detail & Related papers (2021-10-02T18:06:21Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - Multi-Branch with Attention Network for Hand-Based Person Recognition [5.162308830328819]
We propose a novel hand-based person recognition method for the purpose of criminal investigations.
Our proposed method, Multi-Branch with Attention Network (MBA-Net), incorporates both channel and spatial attention modules.
Our proposed method achieves state-of-the-art performance, surpassing the existing hand-based identification methods.
arXiv Detail & Related papers (2021-08-04T18:25:08Z) - GAttANet: Global attention agreement for convolutional neural networks [0.0]
Transformer attention architectures, similar to those developed for natural language processing, have recently proved efficient also in vision.
Here, we report experiments with a simple such attention system that can improve the performance of standard convolutional networks.
We demonstrate the usefulness of this brain-inspired Global Attention Agreement network for various convolutional backbones.
arXiv Detail & Related papers (2021-04-12T15:45:10Z) - Multi-Level Graph Convolutional Network with Automatic Graph Learning
for Hyperspectral Image Classification [63.56018768401328]
We propose a Multi-level Graph Convolutional Network (GCN) with Automatic Graph Learning method (MGCN-AGL) for HSI classification.
By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions.
Our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level.
arXiv Detail & Related papers (2020-09-19T09:26:20Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.