On the Universality of Transformer Architectures; How Much Attention Is Enough?
- URL: http://arxiv.org/abs/2512.18445v1
- Date: Sat, 20 Dec 2025 17:31:59 GMT
- Title: On the Universality of Transformer Architectures; How Much Attention Is Enough?
- Authors: Amirreza Abbasi, Mohsen Hooshmand,
- Abstract summary: Transformers are crucial across many AI fields, such as large language models, computer vision, and reinforcement learning.<n>This work examines the problem of universality in Transformers, reviews recent progress, and surveys state-of-the-art advances.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers are crucial across many AI fields, such as large language models, computer vision, and reinforcement learning. This prominence stems from the architecture's perceived universality and scalability compared to alternatives. This work examines the problem of universality in Transformers, reviews recent progress, including architectural refinements such as structural minimality and approximation rates, and surveys state-of-the-art advances that inform both theoretical and practical understanding. Our aim is to clarify what is currently known about Transformers expressiveness, separate robust guarantees from fragile ones, and identify key directions for future theoretical research.
Related papers
- Revisiting Transformers with Insights from Image Filtering [3.042104695845305]
Self-attention is a cornerstone of Transformer-based state-of-the-art deep learning architectures.<n>We develop a unifying image processing framework to explain self-attention and its components.<n>We empirically observe that image processing-inspired modifications can lead to notably improved accuracy and robustness against data contamination and adversaries across language and vision tasks as well as better long sequence understanding.
arXiv Detail & Related papers (2025-06-12T05:46:57Z) - Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention [9.401360346241296]
This paper introduces a novel modular Transformer architecture that explicitly decouples knowledge and reasoning.<n>We provide a rigorous mathematical derivation demonstrating that the Feed-Forward Network (FFN) in a standard Transformer is a specialized case.
arXiv Detail & Related papers (2025-01-01T12:55:57Z) - Advances in Transformers for Robotic Applications: A Review [0.9208007322096533]
We go through recent advances and trends in Transformers in Robotics.<n>We examine their integration into robotic perception, planning, and control for autonomous systems.<n>We discuss how different Transformer variants are being adapted in robotics for reliable planning and perception.
arXiv Detail & Related papers (2024-12-13T23:02:15Z) - What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis [8.008567379796666]
We provide a fundamental understanding of what distinguishes the Transformer from the other architectures.<n>Our results suggest that various common architectural and optimization choices in Transformers can be traced back to their highly non-linear dependencies.
arXiv Detail & Related papers (2024-10-14T18:15:02Z) - Body Transformer: Leveraging Robot Embodiment for Policy Learning [51.531793239586165]
Body Transformer (BoT) is an architecture that leverages the robot embodiment by providing an inductive bias that guides the learning process.
We represent the robot body as a graph of sensors and actuators, and rely on masked attention to pool information throughout the architecture.
The resulting architecture outperforms the vanilla transformer, as well as the classical multilayer perceptron, in terms of task completion, scaling properties, and computational efficiency.
arXiv Detail & Related papers (2024-08-12T17:31:28Z) - Adventures of Trustworthy Vision-Language Models: A Survey [54.76511683427566]
This paper conducts a thorough examination of vision-language transformers, employing three fundamental principles of responsible AI: Bias, Robustness, and Interpretability.
The primary objective of this paper is to delve into the intricacies and complexities associated with the practical use of transformers, with the overarching goal of advancing our comprehension of how to enhance their reliability and accountability.
arXiv Detail & Related papers (2023-12-07T11:31:20Z) - Introduction to Transformers: an NLP Perspective [59.0241868728732]
We introduce basic concepts of Transformers and present key techniques that form the recent advances of these models.
This includes a description of the standard Transformer architecture, a series of model refinements, and common applications.
arXiv Detail & Related papers (2023-11-29T13:51:04Z) - Energy Transformer [64.22957136952725]
Our work combines aspects of three promising paradigms in machine learning, namely, attention mechanism, energy-based models, and associative memory.
We propose a novel architecture, called the Energy Transformer (or ET for short), that uses a sequence of attention layers that are purposely designed to minimize a specifically engineered energy function.
arXiv Detail & Related papers (2023-02-14T18:51:22Z) - What Makes for Good Tokenizers in Vision Transformer? [62.44987486771936]
transformers are capable of extracting their pairwise relationships using self-attention.
What makes for a good tokenizer has not been well understood in computer vision.
Modulation across Tokens (MoTo) incorporates inter-token modeling capability through normalization.
Regularization objective TokenProp is embraced in the standard training regime.
arXiv Detail & Related papers (2022-12-21T15:51:43Z) - Transformers with Competitive Ensembles of Independent Mechanisms [97.93090139318294]
We propose a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention.
We study TIM on a large-scale BERT model, on the Image Transformer, and on speech enhancement and find evidence for semantically meaningful specialization as well as improved performance.
arXiv Detail & Related papers (2021-02-27T21:48:46Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.