Related papers: Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers

Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers

URL: http://arxiv.org/abs/2509.16058v1
Date: Fri, 19 Sep 2025 15:08:30 GMT
Title: Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
Authors: Krati Saxena, Federico Jurado Ruiz, Guido Manzi, Dianbo Liu, Alex Lamb,
Abstract summary: We introduce ASAC (Attention-based Attention Control), which integrates the attention schema concept into artificial neural networks.<n>By explicitly modeling attention allocation, our approach aims to enhance system efficiency.<n>We demonstrate ASAC's effectiveness in both the vision and NLP domains, highlighting its ability to improve classification accuracy and expedite the learning process.
Score: 6.853513140582486
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Attention mechanisms have become integral in AI, significantly enhancing model performance and scalability by drawing inspiration from human cognition. Concurrently, the Attention Schema Theory (AST) in cognitive science posits that individuals manage their attention by creating a model of the attention itself, effectively allocating cognitive resources. Inspired by AST, we introduce ASAC (Attention Schema-based Attention Control), which integrates the attention schema concept into artificial neural networks. Our initial experiments focused on embedding the ASAC module within transformer architectures. This module employs a Vector-Quantized Variational AutoEncoder (VQVAE) as both an attention abstractor and controller, facilitating precise attention management. By explicitly modeling attention allocation, our approach aims to enhance system efficiency. We demonstrate ASAC's effectiveness in both the vision and NLP domains, highlighting its ability to improve classification accuracy and expedite the learning process. Our experiments with vision transformers across various datasets illustrate that the attention controller not only boosts classification accuracy but also accelerates learning. Furthermore, we have demonstrated the model's robustness and generalization capabilities across noisy and out-of-distribution datasets. In addition, we have showcased improved performance in multi-task settings. Quick experiments reveal that the attention schema-based module enhances resilience to adversarial attacks, optimizes attention to improve learning efficiency, and facilitates effective transfer learning and learning from fewer examples. These promising results establish a connection between cognitive science and machine learning, shedding light on the efficient utilization of attention mechanisms in AI systems.

Related papers

Detection Transformers Under the Knife: A Neuroscience-Inspired Approach to Ablations [5.5967570276373655]
We systematically analyze the impact of ablating key components in three state-of-the-art detection transformer models.<n>We evaluate the effects of these ablations on the performance metrics gIoU and F1-score.<n>This study advances XAI for DETRs by clarifying the contributions of internal components to model performance.
arXiv Detail & Related papers (2025-07-29T12:00:08Z)
Enhancing Generative Class Incremental Learning Performance with Model Forgetting Approach [50.36650300087987]
This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism. We have found that integrating the forgetting mechanisms significantly enhances the models' performance in acquiring new knowledge.
arXiv Detail & Related papers (2024-03-27T05:10:38Z)
Switchable Self-attention Module [3.8992324495848356]
We propose a self-attention module SEM. Based on the input information of the attention module and alternative attention operators, SEM can automatically decide to select and integrate attention operators to compute attention maps. The effectiveness of SEM is demonstrated by extensive experiments on widely used benchmark datasets and popular self-attention networks.
arXiv Detail & Related papers (2022-09-13T01:19:38Z)
Self-Supervised Implicit Attention: Guided Attention by The Model Itself [1.3406858660972554]
We propose Self-Supervised Implicit Attention (SSIA), a new approach that adaptively guides deep neural network models to gain attention by exploiting the properties of the models themselves. SSIAA is a novel attention mechanism that does not require any extra parameters, computation, or memory access costs during inference. Our implementation will be available on GitHub.
arXiv Detail & Related papers (2022-06-15T10:13:34Z)
TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in CNNs [18.24779045808196]
We propose a lightweight top-down (TD) attention module that iteratively generates a "visual searchlight" to perform top-down channel and spatial modulation of its inputs. Our models are more robust to changes in input resolution during inference and learn to "shift attention" by localizing individual objects or features at each computation step without any explicit supervision.
arXiv Detail & Related papers (2021-11-26T12:35:17Z)
Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification [101.49122450005869]
We present a counterfactual attention learning method to learn more effective attention based on causal inference. Specifically, we analyze the effect of the learned visual attention on network prediction. We evaluate our method on a wide range of fine-grained recognition tasks.
arXiv Detail & Related papers (2021-08-19T14:53:40Z)
SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity. Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism. We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z)
Deep Reinforced Attention Learning for Quality-Aware Visual Recognition [73.15276998621582]
We build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks. We introduce a meta critic network to evaluate the quality of attention maps in the main network.
arXiv Detail & Related papers (2020-07-13T02:44:38Z)
Cost-effective Interactive Attention Learning with Neural Attention Processes [79.8115563067513]
We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL) IAL is prone to overfitting due to scarcity of human annotations, and requires costly retraining. We tackle these challenges by proposing a sample-efficient attention mechanism and a cost-effective reranking algorithm for instances and features.
arXiv Detail & Related papers (2020-06-09T17:36:41Z)
Guided Variational Autoencoder for Disentanglement Learning [79.02010588207416]
We propose an algorithm, guided variational autoencoder (Guided-VAE), that is able to learn a controllable generative model by performing latent representation disentanglement learning. We design an unsupervised strategy and a supervised strategy in Guided-VAE and observe enhanced modeling and controlling capability over the vanilla VAE.
arXiv Detail & Related papers (2020-04-02T20:49:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.