Related papers: Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers

Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers

URL: http://arxiv.org/abs/2510.25013v1
Date: Tue, 28 Oct 2025 22:25:19 GMT
Title: Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers
Authors: Rabin Adhikari,
Abstract summary: We train small, attention-only transformers from scratch on a symbolic version of the Indirect Object Identification task.<n>A single-layer model with only two attention heads achieves perfect IOI accuracy, despite lacking residuals and normalization layers.<n>A two-layer, one-head model achieves similar performance by composing information across layers through query-value interactions.
Score: 0.10152838128195467
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mechanistic interpretability aims to reverse-engineer large language models (LLMs) into human-understandable computational circuits. However, the complexity of pretrained models often obscures the minimal mechanisms required for specific reasoning tasks. In this work, we train small, attention-only transformers from scratch on a symbolic version of the Indirect Object Identification (IOI) task -- a benchmark for studying coreference -- like reasoning in transformers. Surprisingly, a single-layer model with only two attention heads achieves perfect IOI accuracy, despite lacking MLPs and normalization layers. Through residual stream decomposition, spectral analysis, and embedding interventions, we find that the two heads specialize into additive and contrastive subcircuits that jointly implement IOI resolution. Furthermore, we show that a two-layer, one-head model achieves similar performance by composing information across layers through query-value interactions. These results demonstrate that task-specific training induces highly interpretable, minimal circuits, offering a controlled testbed for probing the computational foundations of transformer reasoning.

Related papers

From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers [67.02076505996284]
We study how the choice of pretraining data distribution steers a shallow transformer toward one behavior or the other.<n>Our results shed light on the algorithmic biases of pretrained transformers and offer conceptual guidelines for data-driven control of their learned behaviors.
arXiv Detail & Related papers (2025-12-21T08:10:26Z)
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions [32.71332125930795]
Transformer architectures can solve unseen tasks based on input-output pairs in a given prompt due to in-context learning (ICL)<n>We investigate Markovian function learning through a structured ICL setup to reveal underlying optimization behaviors.
arXiv Detail & Related papers (2025-10-21T13:42:48Z)
From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits [5.1877231178075425]
We investigate the ability of GPT-2 small to handle binary truth values by analyzing its behavior with syllogistic prompts.<n>Through our analysis of several syllogism tasks of varying difficulty, we identify multiple circuits that mechanistically explain GPT-2's logical-reasoning capabilities.
arXiv Detail & Related papers (2025-08-22T05:54:11Z)
What One Cannot, Two Can: Two-Layer Transformers Provably Represent Induction Heads on Any-Order Markov Chains [64.31313691823088]
In-context learning (ICL) is a capability of transformers, through which trained models learn to adapt to new tasks by leveraging information from the input context.<n>We show that a two-layer transformer with one head per layer can indeed represent any conditional k-gram.
arXiv Detail & Related papers (2025-08-10T07:03:01Z)
Small transformer architectures for task switching [2.7195102129095003]
It is non-trivial to conceive small-scale applications for which attention-based architectures outperform traditional approaches.<n>We show that standard transformers cannot solve a basic task switching reference model.<n>We show that transformers, long short-term memory recurrent networks (LSTM), and plain multi-layer perceptrons (MLPs) achieve similar, but only modest prediction accuracies.
arXiv Detail & Related papers (2025-08-06T14:01:05Z)
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias [48.9399496805422]
We focus on two representative tasks in the category of regular language recognition, known as even pairs' and parity check'<n>Our goal is to explore how a one-layer transformer, consisting of an attention layer followed by a linear layer, learns to solve these tasks.
arXiv Detail & Related papers (2025-05-02T00:07:35Z)
Interpreting Affine Recurrence Learning in GPT-style Transformers [54.01174470722201]
In-context learning allows GPT-style transformers to generalize during inference without modifying their weights. This paper focuses specifically on their ability to learn and predict affine recurrences as an ICL task. We analyze the model's internal operations using both empirical and theoretical approaches.
arXiv Detail & Related papers (2024-10-22T21:30:01Z)
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers [54.20763128054692]
We study how a two-attention-layer transformer is trained to perform ICL on $n$-gram Markov chain data. We prove that the gradient flow with respect to a cross-entropy ICL loss converges to a limiting model.
arXiv Detail & Related papers (2024-09-09T18:10:26Z)
In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent. For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z)
Miti-DETR: Object Detection based on Transformers with Mitigatory Self-Attention Convergence [17.854940064699985]
We propose a transformer architecture with a mitigatory self-attention mechanism. Miti-DETR reserves the inputs of each single attention layer to the outputs of that layer so that the "non-attention" information has participated in attention propagation. Miti-DETR significantly enhances the average detection precision and convergence speed towards existing DETR-based models.
arXiv Detail & Related papers (2021-12-26T03:23:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.