Related papers: Transformers are Inherently Succinct

Transformers are Inherently Succinct

URL: http://arxiv.org/abs/2510.19315v2
Date: Thu, 23 Oct 2025 08:09:19 GMT
Title: Transformers are Inherently Succinct
Authors: Pascal Bergsträßer, Ryan Cotterell, Anthony W. Lin,
Abstract summary: We prove that transformers are highly expressive in that they can represent formal languages substantially more succinctly than standard representations of formal languages.<n>As a by-product of this expressivity, we show that verifying properties of transformers is provably intractable.
Score: 46.836122954309566
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose succinctness as a measure of the expressive power of a transformer in describing a concept. To this end, we prove that transformers are highly expressive in that they can represent formal languages substantially more succinctly than standard representations of formal languages like finite automata and Linear Temporal Logic (LTL) formulas. As a by-product of this expressivity, we show that verifying properties of transformers is provably intractable (i.e. EXPSPACE-complete).

Related papers

Probability Distributions Computed by Hard-Attention Transformers [53.17368795629463]
We show that making transformer language recognizers autoregressive can sometimes increase their expressivity.<n>Our overall contribution is to tease apart what functions transformers are capable of expressing, in their most common use-case as language models.
arXiv Detail & Related papers (2025-10-31T02:41:05Z)
Characterizing the Expressivity of Transformer Language Models [56.598551673153366]
We provide an exact characterization of fixed-precision transformers with strict future masking and soft attention.<n>We show that these models are precisely as expressive as a specific fragment of linear temporal logic.<n>We further relate this logic to established classes in formal language theory, automata theory, and algebra.
arXiv Detail & Related papers (2025-05-29T16:30:30Z)
The Counting Power of Transformers [45.96383652484399]
We provide a formal framework for investigating the counting power of transformers.<n>Our main result is that transformers can express counting properties that are highly nonlinear.<n>More precisely, we prove that transformers can capture all semialgebraic counting properties.
arXiv Detail & Related papers (2025-05-16T12:56:59Z)
Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages [7.938342455750221]
We study exact characterizations of transformers with hard attention and attention masking. With strict masking (each position cannot attend to itself) and without position embeddings, these transformers are expressively equivalent to linear temporal logic.
arXiv Detail & Related papers (2023-10-21T03:26:39Z)
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input [50.83356836818667]
We study the approximation and estimation ability of Transformers as sequence-to-sequence functions with infinite dimensional inputs. Our theoretical results support the practical success of Transformers for high dimensional data.
arXiv Detail & Related papers (2023-05-30T02:44:49Z)
A Logic for Expressing Log-Precision Transformers [33.4994300801846]
We show that any log-precision transformer can be equivalently expressed as a first-order logic sentence.<n>This is the tightest known upper bound and first logical characterization of log-precision transformers.
arXiv Detail & Related papers (2022-10-06T04:18:09Z)
Scalable Transformers for Neural Machine Translation [86.4530299266897]
Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation. We propose a novel scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters. A three-stage training scheme is proposed to tackle the difficulty of training the scalable Transformers.
arXiv Detail & Related papers (2021-06-04T04:04:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.