Self-Supervised Implicit Attention: Guided Attention by The Model Itself
- URL: http://arxiv.org/abs/2206.07434v1
- Date: Wed, 15 Jun 2022 10:13:34 GMT
- Title: Self-Supervised Implicit Attention: Guided Attention by The Model Itself
- Authors: Jinyi Wu, Xun Gong, Zhemin Zhang
- Abstract summary: We propose Self-Supervised Implicit Attention (SSIA), a new approach that adaptively guides deep neural network models to gain attention by exploiting the properties of the models themselves.
SSIAA is a novel attention mechanism that does not require any extra parameters, computation, or memory access costs during inference.
Our implementation will be available on GitHub.
- Score: 1.3406858660972554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose Self-Supervised Implicit Attention (SSIA), a new approach that
adaptively guides deep neural network models to gain attention by exploiting
the properties of the models themselves. SSIA is a novel attention mechanism
that does not require any extra parameters, computation, or memory access costs
during inference, which is in contrast to existing attention mechanism. In
short, by considering attention weights as higher-level semantic information,
we reconsidered the implementation of existing attention mechanisms and further
propose generating supervisory signals from higher network layers to guide
lower network layers for parameter updates. We achieved this by building a
self-supervised learning task using the hierarchical features of the network
itself, which only works at the training stage. To verify the effectiveness of
SSIA, we performed a particular implementation (called an SSIA block) in
convolutional neural network models and validated it on several image
classification datasets. The experimental results show that an SSIA block can
significantly improve the model performance, even outperforms many popular
attention methods that require additional parameters and computation costs,
such as Squeeze-and-Excitation and Convolutional Block Attention Module. Our
implementation will be available on GitHub.
Related papers
- CCDepth: A Lightweight Self-supervised Depth Estimation Network with Enhanced Interpretability [11.076431337488973]
This study proposes a novel hybrid self-supervised depth estimation network, CCDepth, comprising convolutional neural networks (CNNs) and the white-box CRATE network.
This novel network uses CNNs and the CRATE modules to extract local and global information in images, respectively, thereby boosting learning efficiency and reducing model size.
arXiv Detail & Related papers (2024-09-30T04:19:40Z) - A Primal-Dual Framework for Transformers and Neural Networks [52.814467832108875]
Self-attention is key to the remarkable success of transformers in sequence modeling tasks.
We show that the self-attention corresponds to the support vector expansion derived from a support vector regression problem.
We propose two new attentions: Batch Normalized Attention (Attention-BN) and Attention with Scaled Head (Attention-SH)
arXiv Detail & Related papers (2024-06-19T19:11:22Z) - Harnessing Neural Unit Dynamics for Effective and Scalable Class-Incremental Learning [38.09011520275557]
Class-incremental learning (CIL) aims to train a model to learn new classes from non-stationary data streams without forgetting old ones.
We propose a new kind of connectionist model by tailoring neural unit dynamics that adapt the behavior of neural networks for CIL.
arXiv Detail & Related papers (2024-06-04T15:47:03Z) - Understanding Self-attention Mechanism via Dynamical System Perspective [58.024376086269015]
Self-attention mechanism (SAM) is widely used in various fields of artificial intelligence.
We show that intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN)
We show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP.
arXiv Detail & Related papers (2023-08-19T08:17:41Z) - Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks.
Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs.
We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z) - Dense Network Expansion for Class Incremental Learning [61.00081795200547]
State-of-the-art approaches use a dynamic architecture based on network expansion (NE), in which a task expert is added per task.
A new NE method, dense network expansion (DNE), is proposed to achieve a better trade-off between accuracy and model complexity.
It outperforms the previous SOTA methods by a margin of 4% in terms of accuracy, with similar or even smaller model scale.
arXiv Detail & Related papers (2023-03-22T16:42:26Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Switchable Self-attention Module [3.8992324495848356]
We propose a self-attention module SEM.
Based on the input information of the attention module and alternative attention operators, SEM can automatically decide to select and integrate attention operators to compute attention maps.
The effectiveness of SEM is demonstrated by extensive experiments on widely used benchmark datasets and popular self-attention networks.
arXiv Detail & Related papers (2022-09-13T01:19:38Z) - Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks [33.07113523598028]
We propose Attention Pruning (AP), a framework that observes attention patterns in a fixed dataset and generates a global sparseness mask.
AP saves 90% of attention computation for language modeling and about 50% for machine translation and GLUE tasks, maintaining result quality.
arXiv Detail & Related papers (2020-11-20T13:58:21Z) - Attention-Guided Network for Iris Presentation Attack Detection [13.875545441867137]
We propose attention-guided iris presentation attack detection (AG-PAD) to augment CNNs with attention mechanisms.
Experiments involving both a JHU-APL proprietary dataset and the benchmark LivDet-Iris-2017 dataset suggest that the proposed method achieves promising results.
arXiv Detail & Related papers (2020-10-23T19:23:51Z) - Deep Reinforced Attention Learning for Quality-Aware Visual Recognition [73.15276998621582]
We build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks.
We introduce a meta critic network to evaluate the quality of attention maps in the main network.
arXiv Detail & Related papers (2020-07-13T02:44:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.