Tracr: Compiled Transformers as a Laboratory for Interpretability
- URL: http://arxiv.org/abs/2301.05062v5
- Date: Fri, 3 Nov 2023 15:11:02 GMT
- Title: Tracr: Compiled Transformers as a Laboratory for Interpretability
- Authors: David Lindner and J\'anos Kram\'ar and Sebastian Farquhar and Matthew
Rahtz and Thomas McGrath and Vladimir Mikulik
- Abstract summary: We show how to "compile" human-readable programs into decoder-only transformer models.
Our compiler, Tracr, generates models with known structure.
We use it to study "superposition" in transformers that execute multi-step algorithms.
- Score: 15.76027393879609
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We show how to "compile" human-readable programs into standard decoder-only
transformer models. Our compiler, Tracr, generates models with known structure.
This structure can be used to design experiments. For example, we use it to
study "superposition" in transformers that execute multi-step algorithms.
Additionally, the known structure of Tracr-compiled models can serve as
ground-truth for evaluating interpretability methods. Commonly, because the
"programs" learned by transformers are unknown it is unclear whether an
interpretation succeeded. We demonstrate our approach by implementing and
examining programs including computing token frequencies, sorting, and
parenthesis checking. We provide an open-source implementation of Tracr at
https://github.com/google-deepmind/tracr.
Related papers
- Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized.
We find that these random transformers can perform a wide range of meaningful algorithmic tasks.
Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z) - Can Transformers Learn $n$-gram Language Models? [77.35809823602307]
We study transformers' ability to learn random $n$-gram LMs of two kinds.
We find that classic estimation techniques for $n$-gram LMs such as add-$lambda$ smoothing outperform transformers.
arXiv Detail & Related papers (2024-10-03T21:21:02Z) - Transformer Explainer: Interactive Learning of Text-Generative Models [65.91049787390692]
Transformer Explainer is an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model.
It runs a live GPT-2 instance locally in the user's browser, empowering users to experiment with their own input and observe in real-time how the internal components and parameters of the Transformer work together.
arXiv Detail & Related papers (2024-08-08T17:49:07Z) - Learning Transformer Programs [78.9509560355733]
We introduce a procedure for training Transformers that are mechanistically interpretable by design.
Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization.
The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size.
arXiv Detail & Related papers (2023-06-01T20:27:01Z) - Looped Transformers as Programmable Computers [48.00010456819222]
We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop.
Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes.
We show how this transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and in-context learning algorithms that employ backpropagation.
arXiv Detail & Related papers (2023-01-30T18:57:31Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.