Thinking Like Transformers
- URL: http://arxiv.org/abs/2106.06981v1
- Date: Sun, 13 Jun 2021 13:04:46 GMT
- Title: Thinking Like Transformers
- Authors: Gail Weiss, Yoav Goldberg, Eran Yahav
- Abstract summary: We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
- Score: 64.96770952820691
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: What is the computational model behind a Transformer? Where recurrent neural
networks have direct parallels in finite state machines, allowing clear
discussion and thought around architecture variants or trained models,
Transformers have no such familiar parallel. In this paper we aim to change
that, proposing a computational model for the transformer-encoder in the form
of a programming language. We map the basic components of a transformer-encoder
-- attention and feed-forward computation -- into simple primitives, around
which we form a programming language: the Restricted Access Sequence Processing
Language (RASP). We show how RASP can be used to program solutions to tasks
that could conceivably be learned by a Transformer, and how a Transformer can
be trained to mimic a RASP solution. In particular, we provide RASP programs
for histograms, sorting, and Dyck-languages. We further use our model to relate
their difficulty in terms of the number of required layers and attention heads:
analyzing a RASP program implies a maximum number of heads and layers necessary
to encode a task in a transformer. Finally, we see how insights gained from our
abstraction might be used to explain phenomena seen in recent works.
Related papers
- Mechanisms of Symbol Processing for In-Context Learning in Transformer Networks [78.54913566111198]
Large Language Models (LLMs) have demonstrated impressive abilities in symbol processing through in-context learning (ICL)
We seek to understand the mechanisms that can enable robust symbol processing in transformer networks.
We develop a high-level language, PSL, that allows us to write symbolic programs to do complex, abstract symbol processing.
arXiv Detail & Related papers (2024-10-23T01:38:10Z) - Transformers are Efficient Compilers, Provably [11.459397066286822]
Transformer-based large language models (LLMs) have demonstrated surprisingly robust performance across a wide range of language-related tasks.
In this paper, we take the first steps towards a formal investigation of using transformers as compilers from an expressive power perspective.
We introduce a representative programming language, Mini-Husky, which encapsulates key features of modern C-like languages.
arXiv Detail & Related papers (2024-10-07T20:31:13Z) - Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized.
We find that these random transformers can perform a wide range of meaningful algorithmic tasks.
Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z) - Can Transformers Learn $n$-gram Language Models? [77.35809823602307]
We study transformers' ability to learn random $n$-gram LMs of two kinds.
We find that classic estimation techniques for $n$-gram LMs such as add-$lambda$ smoothing outperform transformers.
arXiv Detail & Related papers (2024-10-03T21:21:02Z) - Transformers meet Neural Algorithmic Reasoners [16.5785372289558]
We propose a novel approach that combines the Transformer's language understanding with the robustness of graph neural network (GNN)-based neural algorithmic reasoners (NARs)
We evaluate our resulting TransNAR model on CLRS-Text, the text-based version of the CLRS-30 benchmark, and demonstrate significant gains over Transformer-only models for algorithmic reasoning.
arXiv Detail & Related papers (2024-06-13T16:42:06Z) - Learning Transformer Programs [78.9509560355733]
We introduce a procedure for training Transformers that are mechanistically interpretable by design.
Instead of compiling human-written programs into Transformers, we design a modified Transformer that can be trained using gradient-based optimization.
The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size.
arXiv Detail & Related papers (2023-06-01T20:27:01Z) - An Introduction to Transformers [23.915718146956355]
transformer is a neural network component that can be used to learn useful sequences or sets of data-points.
In this note we aim for a mathematically precise, intuitive, and clean description of the transformer architecture.
arXiv Detail & Related papers (2023-04-20T14:54:19Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.