Constant Bit-size Transformers Are Turing Complete
- URL: http://arxiv.org/abs/2506.12027v1
- Date: Thu, 22 May 2025 02:45:38 GMT
- Title: Constant Bit-size Transformers Are Turing Complete
- Authors: Qian Li, Yuyi Wang,
- Abstract summary: We prove that any Turing machine running on inputs of arbitrary length can be simulated by a constant bit-size transformer.<n>Our approach relies on simulating Post machines, a Turing-complete computational model.
- Score: 8.38684825915246
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We prove that any Turing machine running on inputs of arbitrary length can be simulated by a constant bit-size transformer, as long as the context window is sufficiently long. This improves previous works, which require scaling up either the model's precision or the number of parameters on longer inputs. Furthermore, we prove that the complexity class SPACE$[s(n)]$ exactly characterizes the expressive power of a constant bit-size transformer with a context window of length $s(n)$. Our approach relies on simulating Post machines, a Turing-complete computational model. Post machines can be modeled as automata equipped with a queue, exhibiting computational behaviors naturally aligned with those of transformers. The behavioral similarity between transformers and Post machines may offer new insights into the mechanisms underlying the reasoning abilities of transformers.
Related papers
- On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning [87.73401758641089]
Chain-of-thought (CoT) reasoning has improved the performance of modern language models (LMs)<n>We show that LMs can represent the same family of distributions over strings as probabilistic Turing machines.
arXiv Detail & Related papers (2024-06-20T10:59:02Z) - Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers [8.908747084128397]
We introduce the temporal counting logic $textsfK_textt$[#] alongside the RASP variant $textsfC-RASP$.<n>We show they are equivalent to each other, and that together they are the best-known lower bound on the formal expressivity of future-masked soft attention transformers with unbounded input size.
arXiv Detail & Related papers (2024-04-05T20:36:30Z) - Simulating Weighted Automata over Sequences and Trees with Transformers [5.078561931628571]
We show that transformers can simulate weighted finite automata (WFAs), a class of models which subsumes DFAs, as well as weighted tree automata (WTA)
We prove these claims formally and provide upper bounds on the sizes of the transformer models needed as a function of the number of states the target automata.
arXiv Detail & Related papers (2024-03-12T21:54:34Z) - Chain of Thought Empowers Transformers to Solve Inherently Serial Problems [57.58801785642868]
Chain of thought (CoT) is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks.
This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness.
arXiv Detail & Related papers (2024-02-20T10:11:03Z) - The Expressive Power of Transformers with Chain of Thought [29.839710738657203]
In practice, transformers can be improved by allowing them to use a "chain of thought" or "scratchpad"
We show that the answer is yes, but the amount of increase depends crucially on the amount of intermediate generation.
Our results also imply that linear steps keep transformer decoders within context-sensitive languages.
arXiv Detail & Related papers (2023-10-11T22:35:18Z) - How Powerful are Decoder-Only Transformer Neural Models? [0.0]
This is the first work to address the Turing completeness of the underlying technology employed in GPT-x.
We show that the sparsity/compressibility of the word embedding is an important consideration for Turing completeness to hold.
arXiv Detail & Related papers (2023-05-26T15:35:43Z) - Transformers Learn Shortcuts to Automata [52.015990420075944]
We find that a low-depth Transformer can represent the computations of any finite-state automaton.
We show that a Transformer with $O(log T)$ layers can exactly replicate the computation of an automaton on an input sequence of length $T$.
We further investigate the brittleness of these solutions and propose potential mitigations.
arXiv Detail & Related papers (2022-10-19T17:45:48Z) - The Parallelism Tradeoff: Limitations of Log-Precision Transformers [29.716269397142973]
We prove that transformers whose arithmetic precision is logarithmic in the number of input tokens can be simulated by constant-depth logspace-uniform threshold circuits.
This provides insight on the power of transformers using known results in complexity theory.
arXiv Detail & Related papers (2022-07-02T03:49:34Z) - Linearizing Transformer with Key-Value Memory Bank [54.83663647680612]
We propose MemSizer, an approach to project the source sequence into lower dimension representation.
MemSizer not only achieves the same linear time complexity but also enjoys efficient recurrent-style autoregressive generation.
We demonstrate that MemSizer provides an improved tradeoff between efficiency and accuracy over the vanilla transformer.
arXiv Detail & Related papers (2022-03-23T18:10:18Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z) - Addressing Some Limitations of Transformers with Feedback Memory [51.94640029417114]
Transformers have been successfully applied to sequential, auto-regressive tasks despite being feedforward networks.
We propose the Feedback Transformer architecture that exposes all previous representations to all future representations.
We demonstrate on a variety of benchmarks in language modeling, machine translation, and reinforcement learning that the increased representation capacity can create small, shallow models with much stronger performance than comparable Transformers.
arXiv Detail & Related papers (2020-02-21T16:37:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.