Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
- URL: http://arxiv.org/abs/2507.09875v2
- Date: Sat, 27 Sep 2025 19:24:21 GMT
- Title: Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
- Authors: Qinyuan Ye, Robin Jia, Xiang Ren,
- Abstract summary: We show that a function induction mechanism explains the model's generalization from standard addition to off-by-one addition.<n>This mechanism resembles the structure of the induction head mechanism found in prior work and elevates it to a higher level of abstraction.<n>We find that this function induction mechanism is reused in a broader range of tasks, including synthetic tasks such as shifted multiple-choice QA and algorithmic tasks such as base-8 addition.
- Score: 51.26760289602137
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models demonstrate the intriguing ability to perform unseen tasks via in-context learning. However, it remains unclear what mechanisms inside the model drive such task-level generalization. In this work, we approach this question through the lens of off-by-one addition (i.e., 1+1=3, 2+2=5, 3+3=?), a two-step, counterfactual task with an unexpected +1 function as a second step. Leveraging circuit-style interpretability techniques such as path patching, we analyze the models' internal computations behind their performance and present three key findings. First, we uncover a function induction mechanism that explains the model's generalization from standard addition to off-by-one addition. This mechanism resembles the structure of the induction head mechanism found in prior work and elevates it to a higher level of abstraction. Second, we show that the induction of the +1 function is governed by multiple attention heads in parallel, each of which emits a distinct piece of the +1 function. Finally, we find that this function induction mechanism is reused in a broader range of tasks, including synthetic tasks such as shifted multiple-choice QA and algorithmic tasks such as base-8 addition. Overall, our findings offer deeper insights into how reusable and composable structures within language models enable task-level generalization.
Related papers
- Beyond Activation Patterns: A Weight-Based Out-of-Context Explanation of Sparse Autoencoder Features [11.463277740376236]
Current interpretation methods infer feature semantics from activation patterns, but overlook that features are trained to reconstruct activations that serve computational roles in the forward pass.<n>We introduce a novel weight-based interpretation framework that measures functional effects through direct weight interactions, requiring no activation data.
arXiv Detail & Related papers (2026-01-30T01:30:48Z) - Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning [50.99796659680724]
This work investigates out-of-distribution (OOD) generalization in Transformer networks using a GSM8K-style modular arithmetic on computational graphs task as a testbed.<n>We introduce and explore a set of four architectural mechanisms aimed at enhancing OOD generalization.<n>We complement these empirical results with a detailed mechanistic interpretability analysis that reveals how these mechanisms give rise to robust OOD generalization abilities.
arXiv Detail & Related papers (2025-10-15T21:03:59Z) - How do Transformers Learn Implicit Reasoning? [67.02072851088637]
We study how implicit multi-hop reasoning emerges by training transformers from scratch in a controlled symbolic environment.<n>We find that training with atomic triples is not necessary but accelerates learning, and that second-hop generalization relies on query-level exposure to specific compositional structures.
arXiv Detail & Related papers (2025-05-29T17:02:49Z) - Interpreting Affine Recurrence Learning in GPT-style Transformers [54.01174470722201]
In-context learning allows GPT-style transformers to generalize during inference without modifying their weights.
This paper focuses specifically on their ability to learn and predict affine recurrences as an ICL task.
We analyze the model's internal operations using both empirical and theoretical approaches.
arXiv Detail & Related papers (2024-10-22T21:30:01Z) - Continuum Attention for Neural Operators [6.425471760071227]
We study transformers in the function space setting.
We prove that the attention mechanism as implemented in practice is a Monte Carlo or finite difference approximation of this operator.
For this reason we also introduce a function space generalization of the patching strategy from computer vision, and introduce a class of associated neural operators.
arXiv Detail & Related papers (2024-06-10T17:25:46Z) - Bayes Complexity of Learners vs Overfitting [4.873362301533825]
We show that a new notion of complexity of functions governs a PAC Bayes-like generalization bound.
In contrast to previous works, our notion naturally generalizes to neural networks with several layers.
An upper-bound we derive allows to show a separation in the number of samples needed for good generalization between 2 and 4-layer neural networks.
arXiv Detail & Related papers (2023-03-13T13:07:02Z) - Generalization on the Unseen, Logic Reasoning and Degree Curriculum [25.7378861650474]
This paper considers the learning of logical (Boolean) functions with a focus on the generalization on the unseen (GOTU) setting.
We study how different network architectures trained by (S)GD perform under GOTU.
More specifically, this means an interpolator of the training data that has minimal Fourier mass on the higher degree basis elements.
arXiv Detail & Related papers (2023-01-30T17:44:05Z) - Interpretability in the Wild: a Circuit for Indirect Object
Identification in GPT-2 small [68.879023473838]
We present an explanation for how GPT-2 small performs a natural language task called indirect object identification (IOI)
To our knowledge, this investigation is the largest end-to-end attempt at reverse-engineering a natural behavior "in the wild" in a language model.
arXiv Detail & Related papers (2022-11-01T17:08:44Z) - A simple probabilistic neural network for machine understanding [0.0]
We discuss probabilistic neural networks with a fixed internal representation as models for machine understanding.
We derive the internal representation by requiring that it satisfies the principles of maximal relevance and of maximal ignorance about how different features are combined.
We argue that learning machines with this architecture enjoy a number of interesting properties, like the continuity of the representation with respect to changes in parameters and data.
arXiv Detail & Related papers (2022-10-24T13:00:15Z) - Provable General Function Class Representation Learning in Multitask
Bandits and MDPs [58.624124220900306]
multitask representation learning is a popular approach in reinforcement learning to boost the sample efficiency.
In this work, we extend the analysis to general function class representations.
We theoretically validate the benefit of multitask representation learning within general function class for bandits and linear MDP.
arXiv Detail & Related papers (2022-05-31T11:36:42Z) - Recognizing and Verifying Mathematical Equations using Multiplicative
Differential Neural Units [86.9207811656179]
We show that memory-augmented neural networks (NNs) can achieve higher-order, memory-augmented extrapolation, stable performance, and faster convergence.
Our models achieve a 1.53% average improvement over current state-of-the-art methods in equation verification and achieve a 2.22% Top-1 average accuracy and 2.96% Top-5 average accuracy for equation completion.
arXiv Detail & Related papers (2021-04-07T03:50:11Z) - UNIPoint: Universally Approximating Point Processes Intensities [125.08205865536577]
We provide a proof that a class of learnable functions can universally approximate any valid intensity function.
We implement UNIPoint, a novel neural point process model, using recurrent neural networks to parameterise sums of basis function upon each event.
arXiv Detail & Related papers (2020-07-28T09:31:56Z) - I-BERT: Inductive Generalization of Transformer to Arbitrary Context
Lengths [2.604653544948958]
Self-attention has emerged as a vital component of state-of-the-art sequence-to-sequence models for natural language processing.
We propose I-BERT, a bi-directional Transformer that replaces positional encodings with a recurrent layer.
arXiv Detail & Related papers (2020-06-18T00:56:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.