Related papers: Structure Development in List-Sorting Transformers

Structure Development in List-Sorting Transformers

URL: http://arxiv.org/abs/2501.18666v1
Date: Thu, 30 Jan 2025 15:56:25 GMT
Title: Structure Development in List-Sorting Transformers
Authors: Einar Urdshals, Jasmina Urdshals,
Abstract summary: We study how a one-layer attention-only transformer develops relevant structures while learning to sort lists of numbers.<n>At the end of training, the model organizes its attention heads in two main modes that we refer to as vocabulary-splitting and copy-suppression.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study how a one-layer attention-only transformer develops relevant structures while learning to sort lists of numbers. At the end of training, the model organizes its attention heads in two main modes that we refer to as vocabulary-splitting and copy-suppression. Both represent simpler modes than having multiple heads handle overlapping ranges of numbers. Interestingly, vocabulary-splitting is present regardless of whether we use weight decay, a common regularization technique thought to drive simplification, supporting the thesis that neural networks naturally prefer simpler solutions. We relate copy-suppression to a mechanism in GPT-2 and investigate its functional role in our model. Guided by insights from a developmental analysis of the model, we identify features in the training data that drive the model's final acquired solution. This provides a concrete example of how the training data shape the internal organization of transformers, paving the way for future studies that could help us better understand how LLMs develop their internal structures.

Related papers

Incremental Learning of Sparse Attention Patterns in Transformers [29.54151079577767]
This paper introduces a high-order Markov chain task to investigate how transformers learn to integrate information from multiple past positions.<n>We identify a shift in learning dynamics from competitive, where heads converge on the most statistically dominant pattern, to cooperative, where heads specialize in distinct patterns.
arXiv Detail & Related papers (2026-02-22T12:16:06Z)
Patterning: The Dual of Interpretability [2.3443925855637073]
We show that patterning can select which algorithm the model learns by targeting the local learning coefficient of each solution.<n>Results establish that the same mathematical framework used to read internal structure can be inverted to write it.
arXiv Detail & Related papers (2026-01-20T03:15:27Z)
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls [54.57326125204404]
Language models are increasingly capable, yet still fail at a seemingly simple task of multi-digit multiplication.<n>We study why, by reverse-engineering a model that successfully learns multiplication via emphimplicit chain-of-thought'
arXiv Detail & Related papers (2025-09-30T19:03:26Z)
Learning Modular Exponentiation with Transformers [0.0]
We train a 4-layer encoder-decoder Transformer model to perform modular exponentiation.<n>We find that reciprocal training leads to strong performance gains, with sudden generalization across related moduli.<n>These results suggest that transformer models learn modular arithmetic through specialized computational circuits.
arXiv Detail & Related papers (2025-06-30T10:00:44Z)
Grokking ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior [25.975757048963413]
Post-hoc interpretability methods typically attribute a model's behavior to its components, data, or training trajectory in isolation.<n>We present ExPLAIND, a unified framework that integrates all three perspectives.
arXiv Detail & Related papers (2025-05-26T14:53:11Z)
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers [1.7034813545878589]
Transformer models exhibit remarkable in-context learning (ICL) Our work offers an exact dynamical model for ICL and theoretically grounded tools for analyzing complex transformer training.
arXiv Detail & Related papers (2025-04-17T13:05:33Z)
Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment [53.90425382758605]
We show how fine-tuning alters the internal structure of a model to specialize in new multimodal tasks. Our work sheds light on how multimodal representations evolve through fine-tuning and offers a new perspective for interpreting model adaptation in multimodal tasks.
arXiv Detail & Related papers (2025-01-06T13:37:13Z)
Re-examining learning linear functions in context [1.8843687952462742]
In-context learning (ICL) has emerged as a powerful paradigm for easily adapting Large Language Models (LLMs) to various tasks.<n>We explore a simple model of ICL in a controlled setup with synthetic training data.<n>Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches to learn a linear function in-context.
arXiv Detail & Related papers (2024-11-18T10:58:46Z)
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient [0.49478969093606673]
We introduce refined variants of the Local Learning Coefficient (LLC), a measure of model complexity grounded in singular learning theory. We study the development of internal structure in transformer language models during training.
arXiv Detail & Related papers (2024-10-03T20:51:02Z)
Clustering and Alignment: Understanding the Training Dynamics in Modular Addition [0.0]
I study the training dynamics of a small neural network with 2-dimensional embeddings on the problem of modular addition. I study these structures and explain their emergence as a result of two simple tendencies exhibited by pairs of embeddings. I discuss the role of weight decay in my setup and reveal a new mechanism that links regularization and training dynamics.
arXiv Detail & Related papers (2024-08-18T09:09:39Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens [9.590540796223715]
In this paper, we attempt to explore the in-context learning process in Transformers through a lens of representation learning. The ICL inference process of the attention layer aligns with the training procedure of its dual model, generating token representation predictions. We extend our theoretical conclusions to more complicated scenarios, including one Transformer layer and multiple attention layers.
arXiv Detail & Related papers (2023-10-20T01:55:34Z)
In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent. For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z)
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z)
Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data. Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z)
Unveiling Transformers with LEGO: a synthetic reasoning task [23.535488809197787]
We study how the transformer architecture learns to follow a chain of reasoning. In some data regime the trained transformer finds "shortcut" solutions to follow the chain of reasoning. We find that one can prevent such shortcut with appropriate architecture modification or careful data preparation.
arXiv Detail & Related papers (2022-06-09T06:30:17Z)
Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test. We train a variational inference model to predict the causal structure from observational/interventional data. Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.