Related papers: Understanding In-context Learning of Addition via Activation Subspaces

Understanding In-context Learning of Addition via Activation Subspaces

URL: http://arxiv.org/abs/2505.05145v2
Date: Thu, 15 May 2025 07:19:33 GMT
Title: Understanding In-context Learning of Addition via Activation Subspaces
Authors: Xinyan Hu, Kayo Yin, Michael I. Jordan, Jacob Steinhardt, Lijie Chen,
Abstract summary: We study a family of few-shot learning tasks for which the true prediction rule is to add an integer $k$ to the input.<n>We find that Llama-3-8B attains high accuracy on this task for a range of $k$, and localizes its few-shot ability to just three attention heads.
Score: 74.8874431046062
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: To perform in-context learning, language models must extract signals from individual few-shot examples, aggregate these into a learned prediction rule, and then apply this rule to new examples. How is this implemented in the forward pass of modern transformer models? To study this, we consider a structured family of few-shot learning tasks for which the true prediction rule is to add an integer $k$ to the input. We find that Llama-3-8B attains high accuracy on this task for a range of $k$, and localize its few-shot ability to just three attention heads via a novel optimization approach. We further show the extracted signals lie in a six-dimensional subspace, where four of the dimensions track the unit digit and the other two dimensions track overall magnitude. We finally examine how these heads extract information from individual few-shot examples, identifying a self-correction mechanism in which mistakes from earlier examples are suppressed by later examples. Our results demonstrate how tracking low-dimensional subspaces across a forward pass can provide insight into fine-grained computational structures.

Related papers

Arithmetic in Transformers Explained [1.8434042562191815]
We analyze 44 autoregressive transformer models trained on addition, subtraction, or both.<n>We show that the addition models converge on a common logical algorithm, with most models achieving >99.999% prediction accuracy.<n>We introduce a reusable library of mechanistic interpretability tools to define, locate, and visualize these algorithmic circuits.
arXiv Detail & Related papers (2024-02-04T21:33:18Z)
Ticketed Learning-Unlearning Schemes [57.89421552780526]
We propose a new ticketed model for learning--unlearning. We provide space-efficient ticketed learning--unlearning schemes for a broad family of concept classes.
arXiv Detail & Related papers (2023-06-27T18:54:40Z)
Semantic Prompt for Few-Shot Image Recognition [76.68959583129335]
We propose a novel Semantic Prompt (SP) approach for few-shot learning. The proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
arXiv Detail & Related papers (2023-03-24T16:32:19Z)
Gaussian Switch Sampling: A Second Order Approach to Active Learning [11.775252660867285]
In active learning, acquisition functions define informativeness directly on the representation position within the model manifold. We propose a grounded second-order definition of information content and sample importance within the context of active learning. We show that our definition produces highly accurate importance scores even when the model representations are constrained by the lack of training data.
arXiv Detail & Related papers (2023-02-16T15:24:56Z)
Project and Probe: Sample-Efficient Domain Adaptation by Interpolating Orthogonal Features [119.22672589020394]
We propose a lightweight, sample-efficient approach that learns a diverse set of features and adapts to a target distribution by interpolating these features. Our experiments on four datasets, with multiple distribution shift settings for each, show that Pro$2$ improves performance by 5-15% when given limited target data.
arXiv Detail & Related papers (2023-02-10T18:58:03Z)
What learning algorithm is in-context learning? Investigations with linear models [87.91612418166464]
We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly. We show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression. Preliminary evidence that in-context learners share algorithmic features with these predictors.
arXiv Detail & Related papers (2022-11-28T18:59:51Z)
Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems. Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored. We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z)
Layer-wise Analysis of a Self-supervised Speech Representation Model [26.727775920272205]
Self-supervised learning approaches have been successful for pre-training speech representation models. Not much has been studied about the type or extent of information encoded in the pre-trained representations themselves.
arXiv Detail & Related papers (2021-07-10T02:13:25Z)
Deep Learning Through the Lens of Example Difficulty [21.522182447513632]
We introduce a measure of the computational difficulty of making a prediction for a given input: the (effective) prediction depth. Our investigation reveals surprising yet simple relationships between the prediction depth of a given input and the model's uncertainty, confidence, accuracy and speed of learning for that data point.
arXiv Detail & Related papers (2021-06-17T16:48:12Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.