GAttANet: Global attention agreement for convolutional neural networks
- URL: http://arxiv.org/abs/2104.05575v1
- Date: Mon, 12 Apr 2021 15:45:10 GMT
- Title: GAttANet: Global attention agreement for convolutional neural networks
- Authors: Rufin VanRullen and Andrea Alamia
- Abstract summary: Transformer attention architectures, similar to those developed for natural language processing, have recently proved efficient also in vision.
Here, we report experiments with a simple such attention system that can improve the performance of standard convolutional networks.
We demonstrate the usefulness of this brain-inspired Global Attention Agreement network for various convolutional backbones.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer attention architectures, similar to those developed for natural
language processing, have recently proved efficient also in vision, either in
conjunction with or as a replacement for convolutional layers. Typically,
visual attention is inserted in the network architecture as a (series of)
feedforward self-attention module(s), with mutual key-query agreement as the
main selection and routing operation. However efficient, this strategy is only
vaguely compatible with the way that attention is implemented in biological
brains: as a separate and unified network of attentional selection regions,
receiving inputs from and exerting modulatory influence on the entire hierarchy
of visual regions. Here, we report experiments with a simple such attention
system that can improve the performance of standard convolutional networks,
with relatively few additional parameters. Each spatial position in each layer
of the network produces a key-query vector pair; all queries are then pooled
into a global attention query. On the next iteration, the match between each
key and the global attention query modulates the network's activations --
emphasizing or silencing the locations that agree or disagree (respectively)
with the global attention system. We demonstrate the usefulness of this
brain-inspired Global Attention Agreement network (GAttANet) for various
convolutional backbones (from a simple 5-layer toy model to a standard ResNet50
architecture) and datasets (CIFAR10, CIFAR100, Imagenet-1k). Each time, our
global attention system improves accuracy over the corresponding baseline.
Related papers
- Graph Triple Attention Network: A Decoupled Perspective [8.958483386270638]
Graph Transformers face two primary challenges: multi-view chaos and local-global chaos.
We propose a high-level decoupled perspective of GTs, breaking them down into three components and two interaction levels.
We design a decoupled graph triple attention network named DeGTA, which separately computes multi-view attentions and adaptively integrates multi-view local and global information.
arXiv Detail & Related papers (2024-08-14T16:29:07Z) - Image Super-resolution Reconstruction Network based on Enhanced Swin Transformer via Alternating Aggregation of Local-Global Features [6.274236785783168]
This paper proposes enhanced Swin Transformer modules via alternating aggregation of local-global features.
The experimental results show that the proposed network outperforms the other state-of-the-art super-resolution networks.
arXiv Detail & Related papers (2023-12-30T14:11:08Z) - TOPIQ: A Top-down Approach from Semantics to Distortions for Image
Quality Assessment [53.72721476803585]
Image Quality Assessment (IQA) is a fundamental task in computer vision that has witnessed remarkable progress with deep neural networks.
We propose a top-down approach that uses high-level semantics to guide the IQA network to focus on semantically important local distortion regions.
A key component of our approach is the proposed cross-scale attention mechanism, which calculates attention maps for lower level features.
arXiv Detail & Related papers (2023-08-06T09:08:37Z) - Convolution-enhanced Evolving Attention Networks [41.684265133316096]
Evolving Attention-enhanced Dilated Convolutional (EA-DC-) Transformer outperforms state-of-the-art models significantly.
This is the first work that explicitly models the layer-wise evolution of attention maps.
arXiv Detail & Related papers (2022-12-16T08:14:04Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Augmenting Convolutional networks with attention-based aggregation [55.97184767391253]
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.
We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth)
It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption.
arXiv Detail & Related papers (2021-12-27T14:05:41Z) - All the attention you need: Global-local, spatial-channel attention for
image retrieval [11.150896867058902]
We address representation learning for large-scale instance-level image retrieval.
We present global-local attention module (GLAM), which is attached at the end of a backbone network.
We obtain a new feature tensor and, by spatial pooling, we learn a powerful embedding for image retrieval.
arXiv Detail & Related papers (2021-07-16T16:39:13Z) - Conformer: Local Features Coupling Global Representations for Visual
Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning.
Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z) - Visual Concept Reasoning Networks [93.99840807973546]
A split-transform-merge strategy has been broadly used as an architectural constraint in convolutional neural networks for visual recognition tasks.
We propose to exploit this strategy and combine it with our Visual Concept Reasoning Networks (VCRNet) to enable reasoning between high-level visual concepts.
Our proposed model, VCRNet, consistently improves the performance by increasing the number of parameters by less than 1%.
arXiv Detail & Related papers (2020-08-26T20:02:40Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.