Agent Attention: On the Integration of Softmax and Linear Attention
- URL: http://arxiv.org/abs/2312.08874v3
- Date: Mon, 15 Jul 2024 09:42:48 GMT
- Title: Agent Attention: On the Integration of Softmax and Linear Attention
- Authors: Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Siyuan Pan, Pengfei Wan, Shiji Song, Gao Huang,
- Abstract summary: We propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation power.
We show that the proposed agent attention is equivalent to a generalized form of linear attention.
Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its linear attention nature.
- Score: 70.06472039237354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation power. Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$, introduces an additional set of agent tokens $A$ into the conventional attention module. The agent tokens first act as the agent for the query tokens $Q$ to aggregate information from $K$ and $V$, and then broadcast the information back to $Q$. Given the number of agent tokens can be designed to be much smaller than the number of query tokens, the agent attention is significantly more efficient than the widely adopted Softmax attention, while preserving global context modelling capability. Interestingly, we show that the proposed agent attention is equivalent to a generalized form of linear attention. Therefore, agent attention seamlessly integrates the powerful Softmax attention and the highly efficient linear attention. Extensive experiments demonstrate the effectiveness of agent attention with various vision Transformers and across diverse vision tasks, including image classification, object detection, semantic segmentation and image generation. Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its linear attention nature. For instance, when applied to Stable Diffusion, our agent attention accelerates generation and substantially enhances image generation quality without any additional training. Code is available at https://github.com/LeapLabTHU/Agent-Attention.
Related papers
- Core Context Aware Attention for Long Context Language Modeling [50.774702091154204]
We propose a plug-and-play Core Context Aware (CCA) Attention for efficient long-range context modeling.
Our CCA-Attention significantly outperforms state-of-the-art models in terms of computational efficiency and long-context modeling ability.
arXiv Detail & Related papers (2024-12-17T01:54:08Z) - A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs [65.00970402080351]
A promising approach to accelerating large vision-language models (VLMs) is using partial information, such as attention maps from specific layers, to assess token importance and prune less essential tokens.
Our study reveals three key insights: (i) Partial attention information is insufficient for accurately identifying critical visual tokens, resulting in suboptimal performance, especially at low token retention ratios; (ii) Global attention information, such as the attention map aggregated across all layers, more effectively preserves essential tokens and maintains comparable performance under aggressive pruning; and (iii) The global attention map aggregated from a small VLM closely resembles that of a large VLM,
arXiv Detail & Related papers (2024-12-04T13:56:44Z) - Elliptical Attention [1.7597562616011944]
Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision.
We propose using a Mahalanobis distance metric for computing the attention weights to stretch the underlying feature space in directions of high contextual relevance.
arXiv Detail & Related papers (2024-06-19T18:38:11Z) - FAST: Factorizable Attention for Speeding up Transformers [1.3637227185793512]
We present a linearly scaled attention mechanism that maintains the full representation of the attention matrix without compromising on sparsification.
Results indicate that our attention mechanism has a robust performance and holds significant promise for diverse applications where self-attention is used.
arXiv Detail & Related papers (2024-02-12T18:59:39Z) - FLatten Transformer: Vision Transformer using Focused Linear Attention [80.61335173752146]
Linear attention offers a much more efficient alternative with its linear complexity.
Current linear attention approaches either suffer from significant performance degradation or introduce additional computation overhead.
We propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness.
arXiv Detail & Related papers (2023-08-01T10:37:12Z) - Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention [12.588866091856309]
This paper considers partially observable multi-agent reinforcement learning (MARL), where each agent can only observe other agents within a fixed range.
We propose a novel multi-agent reinforcement learning algorithm, Partially Observable Mean Field Multi-Agent Reinforcement Learning based on Graph-Attention (GAMFQ)
Experiments show that GAMFQ outperforms baselines including the state-of-the-art partially observable mean-field reinforcement learning algorithms.
arXiv Detail & Related papers (2023-04-25T08:38:32Z) - Shunted Self-Attention via Multi-Scale Token Aggregation [124.16925784748601]
Recent Vision Transformer(ViT) models have demonstrated encouraging results across various computer vision tasks.
We propose shunted self-attention(SSA) that allows ViTs to model the attentions at hybrid scales per attention layer.
The SSA-based transformer achieves 84.0% Top-1 accuracy and outperforms the state-of-the-art Focal Transformer on ImageNet.
arXiv Detail & Related papers (2021-11-30T08:08:47Z) - Graph Convolutional Value Decomposition in Multi-Agent Reinforcement
Learning [9.774412108791218]
We propose a novel framework for value function factorization in deep reinforcement learning.
In particular, we consider the team of agents as the set of nodes of a complete directed graph.
We introduce a mixing GNN module, which is responsible for i) factorizing the team state-action value function into individual per-agent observation-action value functions, and ii) explicit credit assignment to each agent in terms of fractions of the global team reward.
arXiv Detail & Related papers (2020-10-09T18:01:01Z) - Learning Hard Retrieval Decoder Attention for Transformers [69.40942736249397]
Transformer translation model is based on the multi-head attention mechanism, which can be parallelized easily.
We show that our hard retrieval attention mechanism is 1.43 times faster in decoding.
arXiv Detail & Related papers (2020-09-30T13:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.