Related papers: Vision Transformers are Circulant Attention Learners

Vision Transformers are Circulant Attention Learners

URL: http://arxiv.org/abs/2512.21542v1
Date: Thu, 25 Dec 2025 07:28:33 GMT
Title: Vision Transformers are Circulant Attention Learners
Authors: Dongchen Han, Tianyu Li, Ziyi Wang, Gao Huang,
Abstract summary: Self-attention mechanism has been a key factor in the advancement of vision Transformers.<n>We present a novel attention paradigm termed textbfCirculant Attention by exploiting the inherent efficient pattern of self-attention.
Score: 30.300457741980846
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The self-attention mechanism has been a key factor in the advancement of vision Transformers. However, its quadratic complexity imposes a heavy computational burden in high-resolution scenarios, restricting the practical application. Previous methods attempt to mitigate this issue by introducing handcrafted patterns such as locality or sparsity, which inevitably compromise model capacity. In this paper, we present a novel attention paradigm termed \textbf{Circulant Attention} by exploiting the inherent efficient pattern of self-attention. Specifically, we first identify that the self-attention matrix in vision Transformers often approximates the Block Circulant matrix with Circulant Blocks (BCCB), a kind of structured matrix whose multiplication with other matrices can be performed in $\mathcal{O}(N\log N)$ time. Leveraging this interesting pattern, we explicitly model the attention map as its nearest BCCB matrix and propose an efficient computation algorithm for fast calculation. The resulting approach closely mirrors vanilla self-attention, differing only in its use of BCCB matrices. Since our design is inspired by the inherent efficient paradigm, it not only delivers $\mathcal{O}(N\log N)$ computation complexity, but also largely maintains the capacity of standard self-attention. Extensive experiments on diverse visual tasks demonstrate the effectiveness of our approach, establishing circulant attention as a promising alternative to self-attention for vision Transformer architectures. Code is available at https://github.com/LeapLabTHU/Circulant-Attention.

Related papers

Scaling Probabilistic Circuits via Monarch Matrices [109.65822339230853]
Probabilistic Circuits (PCs) are tractable representations of probability distributions.<n>We propose a novel sparse and structured parameterization for the sum blocks in PCs.
arXiv Detail & Related papers (2025-06-14T07:39:15Z)
Log-Linear Attention [81.09631871212211]
This paper develops log-linear attention, an attention mechanism that balances linear attention's efficiency and the expressiveness of softmax attention.<n>We show that with a particular growth function, log-linear attention admits a similarly matmul-rich parallel form whose compute cost is log-linear in sequence length.<n>Log-linear attention is a general framework and can be applied on top of existing linear attention variants.
arXiv Detail & Related papers (2025-06-05T08:44:51Z)
Curse of High Dimensionality Issue in Transformer for Long-context Modeling [31.257769500741006]
We propose textitDynamic Group Attention (DGA) to reduce redundancy by aggregating less important tokens during attention computation.<n>Our results show that our DGA significantly reduces computational costs while maintaining competitive performance.
arXiv Detail & Related papers (2025-05-28T08:34:46Z)
A Hybrid Transformer Architecture with a Quantized Self-Attention Mechanism Applied to Molecular Generation [0.0]
We propose a hybrid quantum-classical self-attention mechanism as part of a transformer decoder.<n>We show that the time complexity of the query-key dot product is reduced from $mathcalO(n2 d)$ in a classical model to $mathcalO(n2 d)$ in our quantum model.<n>This work provides a promising avenue for quantum-enhanced natural language processing (NLP)
arXiv Detail & Related papers (2025-02-26T15:15:01Z)
Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction [29.12836710966048]
We propose a novel transformer attention operator whose computational complexity scales linearly with the number of tokens.<n>Our results call into question the conventional wisdom that pairwise similarity style attention mechanisms are critical to the success of transformer architectures.
arXiv Detail & Related papers (2024-12-23T18:59:21Z)
FAST: Factorizable Attention for Speeding up Transformers [1.3637227185793512]
We present a linearly scaled attention mechanism that maintains the full representation of the attention matrix without compromising on sparsification. Results indicate that our attention mechanism has a robust performance and holds significant promise for diverse applications where self-attention is used.
arXiv Detail & Related papers (2024-02-12T18:59:39Z)
Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation [76.68301884987348]
We propose a simple yet effective approach for self-supervised video object segmentation (VOS) Our key insight is that the inherent structural dependencies present in DINO-pretrained Transformers can be leveraged to establish robust-temporal segmentation correspondences in videos. Our method demonstrates state-of-the-art performance across multiple unsupervised VOS benchmarks and excels in complex real-world multi-object video segmentation tasks.
arXiv Detail & Related papers (2023-11-29T18:47:17Z)
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels [23.99075223506133]
We show that attention with high degree can effectively replace softmax without sacrificing model quality. We present a block-based algorithm to apply causal masking efficiently. We validate PolySketchFormerAttention empirically by training language models capable of handling long contexts.
arXiv Detail & Related papers (2023-10-02T21:39:04Z)
FLatten Transformer: Vision Transformer using Focused Linear Attention [80.61335173752146]
Linear attention offers a much more efficient alternative with its linear complexity. Current linear attention approaches either suffer from significant performance degradation or introduce additional computation overhead. We propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness.
arXiv Detail & Related papers (2023-08-01T10:37:12Z)
DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation [3.9548535445908928]
We propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism. Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets without requiring pre-training weights.
arXiv Detail & Related papers (2022-12-27T14:39:39Z)
Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks. Existing methods are either theoretically flawed or empirically ineffective for visual recognition. We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z)
Combiner: Full Attention Transformer with Sparse Computation Cost [142.10203598824964]
We propose Combiner, which provides full attention capability in each attention head while maintaining low computation complexity. We show that most sparse attention patterns used in existing sparse transformers are able to inspire the design of such factorization for full attention. An experimental evaluation on both autoregressive and bidirectional sequence tasks demonstrates the effectiveness of this approach.
arXiv Detail & Related papers (2021-07-12T22:43:11Z)
SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity. Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism. We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.