Looped Transformers as Programmable Computers
- URL: http://arxiv.org/abs/2301.13196v1
- Date: Mon, 30 Jan 2023 18:57:31 GMT
- Title: Looped Transformers as Programmable Computers
- Authors: Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason
D. Lee, Dimitris Papailiopoulos
- Abstract summary: We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop.
Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes.
We show how this transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and in-context learning algorithms that employ backpropagation.
- Score: 48.00010456819222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a framework for using transformer networks as universal computers
by programming them with specific weights and placing them in a loop. Our input
sequence acts as a punchcard, consisting of instructions and memory for data
read/writes. We demonstrate that a constant number of encoder layers can
emulate basic computing blocks, including embedding edit operations, non-linear
functions, function calls, program counters, and conditional branches. Using
these building blocks, we emulate a small instruction-set computer. This allows
us to map iterative algorithms to programs that can be executed by a looped,
13-layer transformer. We show how this transformer, instructed by its input,
can emulate a basic calculator, a basic linear algebra library, and in-context
learning algorithms that employ backpropagation. Our work highlights the
versatility of the attention mechanism, and demonstrates that even shallow
transformers can execute full-fledged, general-purpose programs.
Related papers
- Graph Transformers Dream of Electric Flow [72.06286909236827]
We show that the linear Transformer, when applied to graph data, can implement algorithms that solve canonical problems.
We present explicit weight configurations for implementing each such graph algorithm, and we bound the errors of the constructed Transformers by the errors of the underlying algorithms.
arXiv Detail & Related papers (2024-10-22T05:11:45Z) - Transformers are Efficient Compilers, Provably [11.459397066286822]
Transformer-based large language models (LLMs) have demonstrated surprisingly robust performance across a wide range of language-related tasks.
In this paper, we take the first steps towards a formal investigation of using transformers as compilers from an expressive power perspective.
We introduce a representative programming language, Mini-Husky, which encapsulates key features of modern C-like languages.
arXiv Detail & Related papers (2024-10-07T20:31:13Z) - Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized.
We find that these random transformers can perform a wide range of meaningful algorithmic tasks.
Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z) - Learning Transformer Programs [78.9509560355733]
We introduce a procedure for training Transformers that are mechanistically interpretable by design.
Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization.
The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size.
arXiv Detail & Related papers (2023-06-01T20:27:01Z) - Planning with Large Language Models for Code Generation [100.07232672883897]
Planning-Guided Transformer Decoding (PG-TD) uses a planning algorithm to do lookahead search and guide the Transformer to generate better programs.
We empirically evaluate our framework with several large language models as backbones on public coding challenge benchmarks.
arXiv Detail & Related papers (2023-03-09T18:59:47Z) - Tracr: Compiled Transformers as a Laboratory for Interpretability [15.76027393879609]
We show how to "compile" human-readable programs into decoder-only transformer models.
Our compiler, Tracr, generates models with known structure.
We use it to study "superposition" in transformers that execute multi-step algorithms.
arXiv Detail & Related papers (2023-01-12T14:59:19Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.