Related papers: Learning Transformer Programs

Learning Transformer Programs

URL: http://arxiv.org/abs/2306.01128v2
Date: Tue, 31 Oct 2023 00:47:31 GMT
Title: Learning Transformer Programs
Authors: Dan Friedman, Alexander Wettig, Danqi Chen
Abstract summary: We introduce a procedure for training Transformers that are mechanistically interpretable by design. Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization. The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size.
Score: 78.9509560355733
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent research in mechanistic interpretability has attempted to reverse-engineer Transformer models by carefully inspecting network weights and activations. However, these approaches require considerable manual effort and still fall short of providing complete, faithful descriptions of the underlying algorithms. In this work, we introduce a procedure for training Transformers that are mechanistically interpretable by design. We build on RASP [Weiss et al., 2021], a programming language that can be compiled into Transformer weights. Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization and then automatically converted into a discrete, human-readable program. We refer to these models as Transformer Programs. To validate our approach, we learn Transformer Programs for a variety of problems, including an in-context learning task, a suite of algorithmic problems (e.g. sorting, recognizing Dyck languages), and NLP tasks including named entity recognition and text classification. The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size; and, more importantly, they are easy to interpret. To demonstrate these advantages, we convert Transformers into Python programs and use off-the-shelf code analysis tools to debug model errors and identify the "circuits" used to solve different sub-problems. We hope that Transformer Programs open a new path toward the goal of intrinsically interpretable machine learning.

Related papers

Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized. We find that these random transformers can perform a wide range of meaningful algorithmic tasks. Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z)
On the Expressive Power of a Variant of the Looped Transformer [83.30272757948829]
We design a novel transformer block, dubbed AlgoFormer, to empower transformers with algorithmic capabilities. The proposed AlgoFormer can achieve significantly higher in algorithm representation when using the same number of parameters. Some theoretical and empirical results are presented to show that the designed transformer has the potential to be smarter than human-designed algorithms.
arXiv Detail & Related papers (2024-02-21T07:07:54Z)
Looped Transformers as Programmable Computers [48.00010456819222]
We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We show how this transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and in-context learning algorithms that employ backpropagation.
arXiv Detail & Related papers (2023-01-30T18:57:31Z)
Tracr: Compiled Transformers as a Laboratory for Interpretability [15.76027393879609]
We show how to "compile" human-readable programs into decoder-only transformer models. Our compiler, Tracr, generates models with known structure. We use it to study "superposition" in transformers that execute multi-step algorithms.
arXiv Detail & Related papers (2023-01-12T14:59:19Z)
Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language. We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer. We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z)
Scalable Transformers for Neural Machine Translation [86.4530299266897]
Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation. We propose a novel scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters. A three-stage training scheme is proposed to tackle the difficulty of training the scalable Transformers.
arXiv Detail & Related papers (2021-06-04T04:04:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.