Structured World Representations in Maze-Solving Transformers
- URL: http://arxiv.org/abs/2312.02566v1
- Date: Tue, 5 Dec 2023 08:24:26 GMT
- Title: Structured World Representations in Maze-Solving Transformers
- Authors: Michael Igorevich Ivanitskiy, Alex F. Spies, Tilman R\"auker,
Guillaume Corlouer, Chris Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan
Valentine, Cecilia Diniz Behn, Katsumi Inoue, Samy Wu Fung
- Abstract summary: This work focuses on the abstractions formed by small transformer models.
We find evidence for the consistent emergence of structured internal representations of maze topology and valid paths.
We also take steps towards deciphering the circuity of path-following by identifying attention heads.
- Score: 3.75591091941815
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Transformer models underpin many recent advances in practical machine
learning applications, yet understanding their internal behavior continues to
elude researchers. Given the size and complexity of these models, forming a
comprehensive picture of their inner workings remains a significant challenge.
To this end, we set out to understand small transformer models in a more
tractable setting: that of solving mazes. In this work, we focus on the
abstractions formed by these models and find evidence for the consistent
emergence of structured internal representations of maze topology and valid
paths. We demonstrate this by showing that the residual stream of only a single
token can be linearly decoded to faithfully reconstruct the entire maze. We
also find that the learned embeddings of individual tokens have spatial
structure. Furthermore, we take steps towards deciphering the circuity of
path-following by identifying attention heads (dubbed $\textit{adjacency
heads}$), which are implicated in finding valid subsequent tokens.
Related papers
- Counterfactual Explanations via Riemannian Latent Space Traversal [6.6622532846616505]
Counterfactual explanations form a powerful tool for providing actionable explanations to practitioners.
We introduce counterfactual explanations using a metric pulled back via the decoder and the classifier under scrutiny.
This metric encodes information about the complex geometric structure of the data and the learned representation, enabling us to obtain robust counterfactual trajectories with high fidelity.
arXiv Detail & Related papers (2024-11-04T16:49:39Z) - Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? [69.4145579827826]
We show a fast flow on the regression loss despite the gradient non-ity algorithms for our convergence landscape.
This is the first theoretical analysis for multi-layer Transformer in this setting.
arXiv Detail & Related papers (2024-10-10T18:29:05Z) - Emergence and Function of Abstract Representations in Self-Supervised
Transformers [0.0]
We study the inner workings of small-scale transformers trained to reconstruct partially masked visual scenes.
We show that the network develops intermediate abstract representations, or abstractions, that encode all semantic features of the dataset.
Using precise manipulation experiments, we demonstrate that abstractions are central to the network's decision-making process.
arXiv Detail & Related papers (2023-12-08T20:47:15Z) - Transformers are uninterpretable with myopic methods: a case study with
bounded Dyck grammars [36.780346257061495]
Interpretability methods aim to understand the algorithm implemented by a trained model.
We take a critical view of methods that exclusively focus on individual parts of the model.
arXiv Detail & Related papers (2023-12-03T15:34:46Z) - Curve Your Attention: Mixed-Curvature Transformers for Graph
Representation Learning [77.1421343649344]
We propose a generalization of Transformers towards operating entirely on the product of constant curvature spaces.
We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges.
arXiv Detail & Related papers (2023-09-08T02:44:37Z) - Unsupervised Learning of Invariance Transformations [105.54048699217668]
We develop an algorithmic framework for finding approximate graph automorphisms.
We discuss how this framework can be used to find approximate automorphisms in weighted graphs in general.
arXiv Detail & Related papers (2023-07-24T17:03:28Z) - What Makes for Good Tokenizers in Vision Transformer? [62.44987486771936]
transformers are capable of extracting their pairwise relationships using self-attention.
What makes for a good tokenizer has not been well understood in computer vision.
Modulation across Tokens (MoTo) incorporates inter-token modeling capability through normalization.
Regularization objective TokenProp is embraced in the standard training regime.
arXiv Detail & Related papers (2022-12-21T15:51:43Z) - Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations.
We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z) - Robust and Controllable Object-Centric Learning through Energy-based
Models [95.68748828339059]
ours is a conceptually simple and general approach to learning object-centric representations through an energy-based model.
We show that ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations.
arXiv Detail & Related papers (2022-10-11T15:11:15Z) - A Compositional Atlas of Tractable Circuit Operations: From Simple
Transformations to Complex Information-Theoretic Queries [44.36335714431731]
We show how complex inference scenarios for machine learning can be represented in terms of tractable modular operations over circuits.
We derive a unified framework for reasoning about tractable models that generalizes several results in the literature and opens up novel tractable inference scenarios.
arXiv Detail & Related papers (2021-02-11T17:26:32Z) - Masked Language Modeling for Proteins via Linearly Scalable Long-Context
Transformers [42.93754828584075]
We present a new Transformer architecture, Performer, based on Fast Attention Via Orthogonal Random features (FAVOR)
Our mechanism scales linearly rather than quadratically in the number of tokens in the sequence, is characterized by sub-quadratic space complexity and does not incorporate any sparsity pattern priors.
It provides strong theoretical guarantees: unbiased estimation of the attention matrix and uniform convergence.
arXiv Detail & Related papers (2020-06-05T17:09:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.