Related papers: Memory Augmented Large Language Models are Computationally Universal

Memory Augmented Large Language Models are Computationally Universal

URL: http://arxiv.org/abs/2301.04589v1
Date: Tue, 10 Jan 2023 02:37:44 GMT
Title: Memory Augmented Large Language Models are Computationally Universal
Authors: Dale Schuurmans
Abstract summary: We show that transformer-based large language models are computationally universal when augmented with an external memory. We establish that an existing large language model, Flan-U-PaLM 540B, can be combined with an associative read-write memory to exactly simulate the execution of a universal Turing machine.
Score: 44.64529266193095
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We show that transformer-based large language models are computationally universal when augmented with an external memory. Any deterministic language model that conditions on strings of bounded length is equivalent to a finite automaton, hence computationally limited. However, augmenting such models with a read-write memory creates the possibility of processing arbitrarily large inputs and, potentially, simulating any algorithm. We establish that an existing large language model, Flan-U-PaLM 540B, can be combined with an associative read-write memory to exactly simulate the execution of a universal Turing machine, $U_{15,2}$. A key aspect of the finding is that it does not require any modification of the language model weights. Instead, the construction relies solely on designing a form of stored instruction computer that can subsequently be programmed with a specific set of prompts.

Related papers

Autoregressive Large Language Models are Computationally Universal [59.34397993748194]
We show that autoregressive decoding of a transformer-based language model can realize universal computation. We first show that a universal Turing machine can be simulated by a Lag system with 2027 production rules. We conclude that, by the Church-Turing thesis, prompted gemini-1.5-pro-001 with extended autoregressive (greedy) decoding is a general purpose computer.
arXiv Detail & Related papers (2024-10-04T06:05:17Z)
A Transformer with Stack Attention [84.18399019794036]
We propose augmenting transformer-based language models with a differentiable, stack-based attention mechanism. Our stack-based attention mechanism can be incorporated into any transformer-based language model and adds a level of interpretability to the model. We show that the addition of our stack-based attention mechanism enables the transformer to model some, but not all, deterministic context-free languages.
arXiv Detail & Related papers (2024-05-07T17:47:57Z)
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models [103.59785165735727]
We introduce RecurrentGemma, a family of open language models using Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both.
arXiv Detail & Related papers (2024-04-11T15:27:22Z)
On Languaging a Simulation Engine [6.17566001699186]
Lang2Sim is a language-to-simulation framework that enables interactive navigation on languaging a simulation engine. This work establishes language model as an intelligent platform to unlock the era of languaging a simulation engine.
arXiv Detail & Related papers (2024-02-26T11:01:54Z)
Training Language Models with Memory Augmentation [28.4608705738799]
We present a novel training approach designed for training language models with memory augmentation. Our approach uses a training objective that directly takes in-batch examples as accessible memory. We demonstrate significant gains over previous memory-augmented approaches.
arXiv Detail & Related papers (2022-05-25T11:37:29Z)
LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens. LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z)
Adaptive Semiparametric Language Models [17.53604394786977]
We present a language model that combines a large parametric neural network (i.e., a transformer) with a non-parametric episodic memory component. Experiments on word-based and character-based language modeling datasets demonstrate the efficacy of our proposed method.
arXiv Detail & Related papers (2021-02-04T11:47:03Z)
Explicitly Modeling Syntax in Language Models with Incremental Parsing and a Dynamic Oracle [88.65264818967489]
We propose a new syntax-aware language model: Syntactic Ordered Memory (SOM) The model explicitly models the structure with an incremental and maintains the conditional probability setting of a standard language model. Experiments show that SOM can achieve strong results in language modeling, incremental parsing and syntactic generalization tests.
arXiv Detail & Related papers (2020-10-21T17:39:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.