Universal computation is intrinsic to language model decoding
- URL: http://arxiv.org/abs/2601.08061v1
- Date: Mon, 12 Jan 2026 23:07:50 GMT
- Title: Universal computation is intrinsic to language model decoding
- Authors: Alex Lewandowski, Marlos C. Machado, Dale Schuurmans,
- Abstract summary: We prove that chaining a language model's autoregressive output is sufficient to perform universal computation.<n>Strikingly, we demonstrate that even randomly language models are capable of universal computation before training.
- Score: 45.48946082504832
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models now provide an interface to express and often solve general problems in natural language, yet their ultimate computational capabilities remain a major topic of scientific debate. Unlike a formal computer, a language model is trained to autoregressively predict successive elements in human-generated text. We prove that chaining a language model's autoregressive output is sufficient to perform universal computation. That is, a language model can simulate the execution of any algorithm on any input. The challenge of eliciting desired computational behaviour can thus be reframed in terms of programmability: the ease of finding a suitable prompt. Strikingly, we demonstrate that even randomly initialized language models are capable of universal computation before training. This implies that training does not give rise to computational expressiveness -- rather, it improves programmability, enabling a natural language interface for accessing these intrinsic capabilities.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
We train and evaluate neural networks directly as binary classifiers of strings.<n>We provide results on a variety of languages across the Chomsky hierarchy for three neural architectures.<n>Our contributions will facilitate theoretically sound empirical testing of language recognition claims in future work.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - Autoregressive Large Language Models are Computationally Universal [59.34397993748194]
We show that autoregressive decoding of a transformer-based language model can realize universal computation.
We first show that a universal Turing machine can be simulated by a Lag system with 2027 production rules.
We conclude that, by the Church-Turing thesis, prompted gemini-1.5-pro-001 with extended autoregressive (greedy) decoding is a general purpose computer.
arXiv Detail & Related papers (2024-10-04T06:05:17Z) - Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic Feedback [14.938401898546553]
We propose to use a semi-structured form to represent reasoning steps of large language models.
Specifically, we use relations, which are not only human but also machine-friendly and easier to verify than natural language.
arXiv Detail & Related papers (2024-06-25T18:21:00Z) - Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks.
Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge.
A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z) - Arithmetic with Language Models: from Memorization to Computation [3.077668143048211]
This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data.
We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing.
arXiv Detail & Related papers (2023-08-02T13:58:37Z) - Memory Augmented Large Language Models are Computationally Universal [44.64529266193095]
We show that transformer-based large language models are computationally universal when augmented with an external memory.
We establish that an existing large language model, Flan-U-PaLM 540B, can be combined with an associative read-write memory to exactly simulate the execution of a universal Turing machine.
arXiv Detail & Related papers (2023-01-10T02:37:44Z) - Overcoming Barriers to Skill Injection in Language Modeling: Case Study
in Arithmetic [14.618731441943847]
We develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess.
Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic skills that occurs while injecting non-linguistic skills into language models.
arXiv Detail & Related papers (2022-11-03T18:53:30Z) - Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings.
We demonstrate that this framework enables effective generalization across different environments.
For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z) - Language Models are not Models of Language [0.0]
Transfer learning has enabled large deep learning neural networks trained on the language modeling task to vastly improve performance.
We argue that the term language model is misleading because deep learning models are not theoretical models of language.
arXiv Detail & Related papers (2021-12-13T22:39:46Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.