Autoregressive Large Language Models are Computationally Universal
- URL: http://arxiv.org/abs/2410.03170v1
- Date: Fri, 4 Oct 2024 06:05:17 GMT
- Title: Autoregressive Large Language Models are Computationally Universal
- Authors: Dale Schuurmans, Hanjun Dai, Francesco Zanini,
- Abstract summary: We show that autoregressive decoding of a transformer-based language model can realize universal computation.
We first show that a universal Turing machine can be simulated by a Lag system with 2027 production rules.
We conclude that, by the Church-Turing thesis, prompted gemini-1.5-pro-001 with extended autoregressive (greedy) decoding is a general purpose computer.
- Score: 59.34397993748194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We show that autoregressive decoding of a transformer-based language model can realize universal computation, without external intervention or modification of the model's weights. Establishing this result requires understanding how a language model can process arbitrarily long inputs using a bounded context. For this purpose, we consider a generalization of autoregressive decoding where, given a long input, emitted tokens are appended to the end of the sequence as the context window advances. We first show that the resulting system corresponds to a classical model of computation, a Lag system, that has long been known to be computationally universal. By leveraging a new proof, we show that a universal Turing machine can be simulated by a Lag system with 2027 production rules. We then investigate whether an existing large language model can simulate the behaviour of such a universal Lag system. We give an affirmative answer by showing that a single system-prompt can be developed for gemini-1.5-pro-001 that drives the model, under deterministic (greedy) decoding, to correctly apply each of the 2027 production rules. We conclude that, by the Church-Turing thesis, prompted gemini-1.5-pro-001 with extended autoregressive (greedy) decoding is a general purpose computer.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - Compositional Program Generation for Few-Shot Systematic Generalization [59.57656559816271]
This study on a neuro-symbolic architecture called the Compositional Program Generator (CPG)
CPG has three key features: textitmodularity, textitcomposition, and textitabstraction, in the form of grammar rules.
It perfect achieves generalization on both the SCAN and COGS benchmarks using just 14 examples for SCAN and 22 examples for COGS.
arXiv Detail & Related papers (2023-09-28T14:33:20Z) - Large Language Models as General Pattern Machines [64.75501424160748]
We show that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences.
Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary.
In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics.
arXiv Detail & Related papers (2023-07-10T17:32:13Z) - Sentence-Incremental Neural Coreference Resolution [32.13574453443377]
We propose a sentence-incremental neural coreference resolution system which incrementally builds clusters after marking mention boundaries in a shift-reduce method.
The system is aimed at bridging two recent approaches at coreference resolution: (1) state-of-the-art non-incremental models that incur quadratic complexity in document length with high computational cost, and (2) memory network-based models which operate incrementally but do not generalize beyond pronouns.
arXiv Detail & Related papers (2023-05-26T14:00:25Z) - Memory Augmented Large Language Models are Computationally Universal [44.64529266193095]
We show that transformer-based large language models are computationally universal when augmented with an external memory.
We establish that an existing large language model, Flan-U-PaLM 540B, can be combined with an associative read-write memory to exactly simulate the execution of a universal Turing machine.
arXiv Detail & Related papers (2023-01-10T02:37:44Z) - Recursive Decoding: A Situated Cognition Approach to Compositional
Generation in Grounded Language Understanding [0.0]
We present Recursive Decoding, a novel procedure for training and using seq2seq models.
Rather than generating an entire output sequence in one pass, models are trained to predict one token at a time.
RD yields dramatic improvement on two previously neglected generalization tasks in gSCAN.
arXiv Detail & Related papers (2022-01-27T19:13:42Z) - CGEMs: A Metric Model for Automatic Code Generation using GPT-3 [0.0]
This work aims to validate AI-generated content using theoretical proofs or by using Monte-Carlo simulation methods.
In this case, we use the latter approach to test/validate a statistically significant number of samples.
The various metrics that are garnered in this work to support the evaluation of generated code are as follows: Compilation, NL description to logic conversion, number of edits needed, some of the commonly used static-code metrics and NLP metrics.
arXiv Detail & Related papers (2021-08-23T13:28:57Z) - Imputer: Sequence Modelling via Imputation and Dynamic Programming [101.5705527605346]
Imputer is an iterative generative model, requiring only a constant number of generation steps independent of the number of input or output tokens.
We present a tractable dynamic programming training algorithm, which yields a lower bound on the log marginal likelihood.
arXiv Detail & Related papers (2020-02-20T18:21:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.