Related papers: Concise One-Layer Transformers Can Do Function Evaluation (Sometimes)

Concise One-Layer Transformers Can Do Function Evaluation (Sometimes)

URL: http://arxiv.org/abs/2503.22076v1
Date: Fri, 28 Mar 2025 01:40:23 GMT
Title: Concise One-Layer Transformers Can Do Function Evaluation (Sometimes)
Authors: Lena Strobl, Dana Angluin, Robert Frank,
Abstract summary: This paper contributes to the study of the expressive capacity of transformers.<n>We focus on their ability to perform the fundamental computational task of evaluating an arbitrary function from $[n]$ to $[n]$ at a given argument.
Score: 1.157192696857674
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While transformers have proven enormously successful in a range of tasks, their fundamental properties as models of computation are not well understood. This paper contributes to the study of the expressive capacity of transformers, focusing on their ability to perform the fundamental computational task of evaluating an arbitrary function from $[n]$ to $[n]$ at a given argument. We prove that concise 1-layer transformers (i.e., with a polylog bound on the product of the number of heads, the embedding dimension, and precision) are capable of doing this task under some representations of the input, but not when the function's inputs and values are only encoded in different input positions. Concise 2-layer transformers can perform the task even with the more difficult input representation. Experimentally, we find a rough alignment between what we have proven can be computed by concise transformers and what can be practically learned.

Related papers

On the Role of Depth and Looping for In-Context Learning with Task Diversity [69.4145579827826]
We study in-context learning for linear regression with diverse tasks. We show that multilayer Transformers are not robust to even distributional shifts as small as $O(e-L)$ in Wasserstein distance.
arXiv Detail & Related papers (2024-10-29T03:27:56Z)
Dissecting Multiplication in Transformers: Insights into LLMs [23.109124772063574]
We focus on a typical arithmetic task, integer multiplication, to explore and explain the imperfection of transformers in this domain. We provide comprehensive analysis of a vanilla transformer trained to perform n-digit integer multiplication. We propose improvements to enhance transformers performance on multiplication tasks.
arXiv Detail & Related papers (2024-07-22T04:07:26Z)
When Can Transformers Count to n? [48.32323039293186]
We show that if the dimension of the transformer state is linear in the context length, this task can be solved. We provide theoretical arguments for why it is likely impossible for a size limited transformer to implement this task. Our results demonstrate the importance of understanding how transformers can solve simple tasks.
arXiv Detail & Related papers (2024-07-21T13:31:02Z)
Transformers are Expressive, But Are They Expressive Enough for Regression? [38.369337945109855]
We show that Transformers struggle to reliably approximate smooth functions, relying on piecewise constant approximations with sizable intervals. By shedding light on these challenges, we advocate a refined understanding of Transformers' capabilities.
arXiv Detail & Related papers (2024-02-23T18:12:53Z)
AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures [80.28359222380733]
We design a novel transformer framework, dubbed AlgoFormer, to empower transformers with algorithmic capabilities. In particular, inspired by the structure of human-designed learning algorithms, our transformer framework consists of a pre-transformer that is responsible for task preprocessing. Some theoretical and empirical results are presented to show that the designed transformer has the potential to perform algorithm representation and learning.
arXiv Detail & Related papers (2024-02-21T07:07:54Z)
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems [57.58801785642868]
Chain of thought (CoT) is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness.
arXiv Detail & Related papers (2024-02-20T10:11:03Z)
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input [50.83356836818667]
We study the approximation and estimation ability of Transformers as sequence-to-sequence functions with infinite dimensional inputs. Our theoretical results support the practical success of Transformers for high dimensional data.
arXiv Detail & Related papers (2023-05-30T02:44:49Z)
On the Power of Saturated Transformers: A View from Circuit Complexity [87.20342701232869]
We show that saturated transformers transcend the limitations of hard-attention transformers. The jump from hard to saturated attention can be understood as increasing the transformer's effective circuit depth by a factor of $O(log n)$.
arXiv Detail & Related papers (2021-06-30T17:09:47Z)
On the Computational Power of Transformers and its Implications in Sequence Modeling [10.497742214344855]
In particular, the roles of various components in Transformers such as positional encodings, attention heads, residual connections, and feedforward networks, are not clear. We provide an alternate and simpler proof to show that vanilla Transformers are Turing-complete. We further analyze the necessity of each component for the Turing-completeness of the network; interestingly, we find that a particular type of residual connection is necessary.
arXiv Detail & Related papers (2020-06-16T16:27:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.