Related papers: Thinking Like Transformers

Thinking Like Transformers

URL: http://arxiv.org/abs/2106.06981v1
Date: Sun, 13 Jun 2021 13:04:46 GMT
Title: Thinking Like Transformers
Authors: Gail Weiss, Yoav Goldberg, Eran Yahav
Abstract summary: We propose a computational model for the transformer-encoder in the form of a programming language. We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer. We provide RASP programs for histograms, sorting, and Dyck-languages.
Score: 64.96770952820691
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: What is the computational model behind a Transformer? Where recurrent neural networks have direct parallels in finite state machines, allowing clear discussion and thought around architecture variants or trained models, Transformers have no such familiar parallel. In this paper we aim to change that, proposing a computational model for the transformer-encoder in the form of a programming language. We map the basic components of a transformer-encoder -- attention and feed-forward computation -- into simple primitives, around which we form a programming language: the Restricted Access Sequence Processing Language (RASP). We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer, and how a Transformer can be trained to mimic a RASP solution. In particular, we provide RASP programs for histograms, sorting, and Dyck-languages. We further use our model to relate their difficulty in terms of the number of required layers and attention heads: analyzing a RASP program implies a maximum number of heads and layers necessary to encode a task in a transformer. Finally, we see how insights gained from our abstraction might be used to explain phenomena seen in recent works.

Related papers

Mechanisms of Symbol Processing for In-Context Learning in Transformer Networks [78.54913566111198]
Large Language Models (LLMs) have demonstrated impressive abilities in symbol processing through in-context learning (ICL) We seek to understand the mechanisms that can enable robust symbol processing in transformer networks. We develop a high-level language, PSL, that allows us to write symbolic programs to do complex, abstract symbol processing.
arXiv Detail & Related papers (2024-10-23T01:38:10Z)
Transformers are Efficient Compilers, Provably [11.459397066286822]
Transformer-based large language models (LLMs) have demonstrated surprisingly robust performance across a wide range of language-related tasks. In this paper, we take the first steps towards a formal investigation of using transformers as compilers from an expressive power perspective. We introduce a representative programming language, Mini-Husky, which encapsulates key features of modern C-like languages.
arXiv Detail & Related papers (2024-10-07T20:31:13Z)
Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized. We find that these random transformers can perform a wide range of meaningful algorithmic tasks. Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z)
Can Transformers Learn $n$-gram Language Models? [77.35809823602307]
We study transformers' ability to learn random $n$-gram LMs of two kinds. We find that classic estimation techniques for $n$-gram LMs such as add-$lambda$ smoothing outperform transformers.
arXiv Detail & Related papers (2024-10-03T21:21:02Z)
Transformers meet Neural Algorithmic Reasoners [16.5785372289558]
We propose a novel approach that combines the Transformer's language understanding with the robustness of graph neural network (GNN)-based neural algorithmic reasoners (NARs) We evaluate our resulting TransNAR model on CLRS-Text, the text-based version of the CLRS-30 benchmark, and demonstrate significant gains over Transformer-only models for algorithmic reasoning.
arXiv Detail & Related papers (2024-06-13T16:42:06Z)
AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures [80.28359222380733]
We design a novel transformer framework, dubbed AlgoFormer, to empower transformers with algorithmic capabilities. In particular, inspired by the structure of human-designed learning algorithms, our transformer framework consists of a pre-transformer that is responsible for task preprocessing. Some theoretical and empirical results are presented to show that the designed transformer has the potential to perform algorithm representation and learning.
arXiv Detail & Related papers (2024-02-21T07:07:54Z)
Learning Transformer Programs [78.9509560355733]
We introduce a procedure for training Transformers that are mechanistically interpretable by design. Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization. The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size.
arXiv Detail & Related papers (2023-06-01T20:27:01Z)
An Introduction to Transformers [23.915718146956355]
transformer is a neural network component that can be used to learn useful sequences or sets of data-points. In this note we aim for a mathematically precise, intuitive, and clean description of the transformer architecture.
arXiv Detail & Related papers (2023-04-20T14:54:19Z)
Transformers Solve the Limited Receptive Field for Monocular Depth Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers. This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.