Looped Transformers as Programmable Computers
- URL: http://arxiv.org/abs/2301.13196v1
- Date: Mon, 30 Jan 2023 18:57:31 GMT
- Title: Looped Transformers as Programmable Computers
- Authors: Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason
D. Lee, Dimitris Papailiopoulos
- Abstract summary: We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop.
Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes.
We show how this transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and in-context learning algorithms that employ backpropagation.
- Score: 48.00010456819222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a framework for using transformer networks as universal computers
by programming them with specific weights and placing them in a loop. Our input
sequence acts as a punchcard, consisting of instructions and memory for data
read/writes. We demonstrate that a constant number of encoder layers can
emulate basic computing blocks, including embedding edit operations, non-linear
functions, function calls, program counters, and conditional branches. Using
these building blocks, we emulate a small instruction-set computer. This allows
us to map iterative algorithms to programs that can be executed by a looped,
13-layer transformer. We show how this transformer, instructed by its input,
can emulate a basic calculator, a basic linear algebra library, and in-context
learning algorithms that employ backpropagation. Our work highlights the
versatility of the attention mechanism, and demonstrates that even shallow
transformers can execute full-fledged, general-purpose programs.
Related papers
- On the Expressive Power of a Variant of the Looped Transformer [83.30272757948829]
We design a novel transformer block, dubbed AlgoFormer, to empower transformers with algorithmic capabilities.
The proposed AlgoFormer can achieve significantly higher in algorithm representation when using the same number of parameters.
Some theoretical and empirical results are presented to show that the designed transformer has the potential to be smarter than human-designed algorithms.
arXiv Detail & Related papers (2024-02-21T07:07:54Z) - Banach-Tarski Embeddings and Transformers [0.0]
We introduce a new construction of embeddings of arbitrary data structures into high dimensional vectors.
These embeddings provide an interpretable model for the latent state vectors of transformers.
We show that these embeddings can be decoded to the original data structure when the embedding dimension is sufficiently large.
arXiv Detail & Related papers (2023-11-15T21:30:26Z) - Learning Transformer Programs [78.9509560355733]
We introduce a procedure for training Transformers that are mechanistically interpretable by design.
Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization.
The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size.
arXiv Detail & Related papers (2023-06-01T20:27:01Z) - Planning with Large Language Models for Code Generation [100.07232672883897]
Planning-Guided Transformer Decoding (PG-TD) uses a planning algorithm to do lookahead search and guide the Transformer to generate better programs.
We empirically evaluate our framework with several large language models as backbones on public coding challenge benchmarks.
arXiv Detail & Related papers (2023-03-09T18:59:47Z) - Tracr: Compiled Transformers as a Laboratory for Interpretability [15.76027393879609]
We show how to "compile" human-readable programs into decoder-only transformer models.
Our compiler, Tracr, generates models with known structure.
We use it to study "superposition" in transformers that execute multi-step algorithms.
arXiv Detail & Related papers (2023-01-12T14:59:19Z) - Characterizing Intrinsic Compositionality in Transformers with Tree
Projections [72.45375959893218]
neural models like transformers can route information arbitrarily between different parts of their input.
We show that transformers for three different tasks become more treelike over the course of training.
These trees are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
arXiv Detail & Related papers (2022-11-02T17:10:07Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.