Related papers: Structured World Representations in Maze-Solving Transformers

Structured World Representations in Maze-Solving Transformers

URL: http://arxiv.org/abs/2312.02566v1
Date: Tue, 5 Dec 2023 08:24:26 GMT
Title: Structured World Representations in Maze-Solving Transformers
Authors: Michael Igorevich Ivanitskiy, Alex F. Spies, Tilman R\"auker, Guillaume Corlouer, Chris Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan Valentine, Cecilia Diniz Behn, Katsumi Inoue, Samy Wu Fung
Abstract summary: This work focuses on the abstractions formed by small transformer models. We find evidence for the consistent emergence of structured internal representations of maze topology and valid paths. We also take steps towards deciphering the circuity of path-following by identifying attention heads.
Score: 3.75591091941815
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Transformer models underpin many recent advances in practical machine learning applications, yet understanding their internal behavior continues to elude researchers. Given the size and complexity of these models, forming a comprehensive picture of their inner workings remains a significant challenge. To this end, we set out to understand small transformer models in a more tractable setting: that of solving mazes. In this work, we focus on the abstractions formed by these models and find evidence for the consistent emergence of structured internal representations of maze topology and valid paths. We demonstrate this by showing that the residual stream of only a single token can be linearly decoded to faithfully reconstruct the entire maze. We also find that the learned embeddings of individual tokens have spatial structure. Furthermore, we take steps towards deciphering the circuity of path-following by identifying attention heads (dubbed $\textit{adjacency heads}$), which are implicated in finding valid subsequent tokens.

Related papers

Transformers Use Causal World Models in Maze-Solving Tasks [49.67445252528868]
We identify World Models in transformers trained on maze-solving tasks. We find that it is easier to activate features than to suppress them. positional encoding schemes appear to influence how World Models are structured within the model's residual stream.
arXiv Detail & Related papers (2024-12-16T15:21:04Z)
Understanding Hidden Computations in Chain-of-Thought Reasoning [0.0]
Chain-of-Thought (CoT) prompting has significantly enhanced the reasoning abilities of large language models. Recent studies have shown that models can still perform complex reasoning tasks even when the CoT is replaced with filler(hidden) characters.
arXiv Detail & Related papers (2024-12-05T18:43:11Z)
Counterfactual Explanations via Riemannian Latent Space Traversal [6.6622532846616505]
Counterfactual explanations form a powerful tool for providing actionable explanations to practitioners. We introduce counterfactual explanations using a metric pulled back via the decoder and the classifier under scrutiny. This metric encodes information about the complex geometric structure of the data and the learned representation, enabling us to obtain robust counterfactual trajectories with high fidelity.
arXiv Detail & Related papers (2024-11-04T16:49:39Z)
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? [69.4145579827826]
We show a fast flow on the regression loss despite the gradient non-ity algorithms for our convergence landscape. This is the first theoretical analysis for multi-layer Transformer in this setting.
arXiv Detail & Related papers (2024-10-10T18:29:05Z)
Emergence and Function of Abstract Representations in Self-Supervised Transformers [0.0]
We study the inner workings of small-scale transformers trained to reconstruct partially masked visual scenes. We show that the network develops intermediate abstract representations, or abstractions, that encode all semantic features of the dataset. Using precise manipulation experiments, we demonstrate that abstractions are central to the network's decision-making process.
arXiv Detail & Related papers (2023-12-08T20:47:15Z)
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars [36.780346257061495]
Interpretability methods aim to understand the algorithm implemented by a trained model. We take a critical view of methods that exclusively focus on individual parts of the model.
arXiv Detail & Related papers (2023-12-03T15:34:46Z)
Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning [77.1421343649344]
We propose a generalization of Transformers towards operating entirely on the product of constant curvature spaces. We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges.
arXiv Detail & Related papers (2023-09-08T02:44:37Z)
Unsupervised Learning of Invariance Transformations [105.54048699217668]
We develop an algorithmic framework for finding approximate graph automorphisms. We discuss how this framework can be used to find approximate automorphisms in weighted graphs in general.
arXiv Detail & Related papers (2023-07-24T17:03:28Z)
What Makes for Good Tokenizers in Vision Transformer? [62.44987486771936]
transformers are capable of extracting their pairwise relationships using self-attention. What makes for a good tokenizer has not been well understood in computer vision. Modulation across Tokens (MoTo) incorporates inter-token modeling capability through normalization. Regularization objective TokenProp is embraced in the standard training regime.
arXiv Detail & Related papers (2022-12-21T15:51:43Z)
Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z)
Robust and Controllable Object-Centric Learning through Energy-based Models [95.68748828339059]
ours is a conceptually simple and general approach to learning object-centric representations through an energy-based model. We show that ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations.
arXiv Detail & Related papers (2022-10-11T15:11:15Z)
A Compositional Atlas of Tractable Circuit Operations: From Simple Transformations to Complex Information-Theoretic Queries [44.36335714431731]
We show how complex inference scenarios for machine learning can be represented in terms of tractable modular operations over circuits. We derive a unified framework for reasoning about tractable models that generalizes several results in the literature and opens up novel tractable inference scenarios.
arXiv Detail & Related papers (2021-02-11T17:26:32Z)
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers [42.93754828584075]
We present a new Transformer architecture, Performer, based on Fast Attention Via Orthogonal Random features (FAVOR) Our mechanism scales linearly rather than quadratically in the number of tokens in the sequence, is characterized by sub-quadratic space complexity and does not incorporate any sparsity pattern priors. It provides strong theoretical guarantees: unbiased estimation of the attention matrix and uniform convergence.
arXiv Detail & Related papers (2020-06-05T17:09:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.