Related papers: Anatomy of an Idiom: Tracing Non-Compositionality in Language Models

Anatomy of an Idiom: Tracing Non-Compositionality in Language Models

URL: http://arxiv.org/abs/2511.16467v1
Date: Thu, 20 Nov 2025 15:35:50 GMT
Title: Anatomy of an Idiom: Tracing Non-Compositionality in Language Models
Authors: Andrew Gomes,
Abstract summary: We find that idiom processing exhibits distinct computational patterns.<n>We identify and investigate Idiom Heads'' attention heads that frequently activate across different idioms.<n>These findings provide insights into how transformers handle non-compositional language.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate the processing of idiomatic expressions in transformer-based language models using a novel set of techniques for circuit discovery and analysis. First discovering circuits via a modified path patching algorithm, we find that idiom processing exhibits distinct computational patterns. We identify and investigate ``Idiom Heads,'' attention heads that frequently activate across different idioms, as well as enhanced attention between idiom tokens due to earlier processing, which we term ``augmented reception.'' We analyze these phenomena and the general features of the discovered circuits as mechanisms by which transformers balance computational efficiency and robustness. Finally, these findings provide insights into how transformers handle non-compositional language and suggest pathways for understanding the processing of more complex grammatical constructions.

Related papers

Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer [65.38883376379812]
We propose the Discrete Transformer, an architecture engineered to bridge the gap between continuous representations and discrete symbolic logic.<n> Empirically, the Discrete Transformer not only achieves performance comparable to RNN-based baselines but crucially extends interpretability to continuous variable domains.
arXiv Detail & Related papers (2026-01-09T12:49:41Z)
Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning [50.99796659680724]
This work investigates out-of-distribution (OOD) generalization in Transformer networks using a GSM8K-style modular arithmetic on computational graphs task as a testbed.<n>We introduce and explore a set of four architectural mechanisms aimed at enhancing OOD generalization.<n>We complement these empirical results with a detailed mechanistic interpretability analysis that reveals how these mechanisms give rise to robust OOD generalization abilities.
arXiv Detail & Related papers (2025-10-15T21:03:59Z)
An explainable transformer circuit for compositional generalization [4.446278061385101]
We identify and mechanistically interpret the circuit responsible for compositional induction in a compact transformer.<n>Using causal ablations, we validate the circuit and formalize its operation using a program-like description.<n>Our findings advance the understanding of complex behaviors in transformers and highlight such insights can provide a direct pathway for model control.
arXiv Detail & Related papers (2025-02-19T02:30:41Z)
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models [22.89563355840371]
We study the modularity of neural networks by analyzing circuits for highly compositional subtasks within a language model.<n>Our results indicate that functionally similar circuits exhibit both notable node overlap and cross-task faithfulness.
arXiv Detail & Related papers (2024-10-02T11:36:45Z)
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models [55.19497659895122]
We introduce methods for discovering and applying sparse feature circuits.<n>These are causally implicatedworks of human-interpretable features for explaining language model behaviors.
arXiv Detail & Related papers (2024-03-28T17:56:07Z)
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models [9.56229382432426]
This research aims to reverse engineer transformer models into human-readable representations that implement algorithmic functions. By applying circuit interpretability analysis, we identify a key sub-circuit in both GPT-2 Small and Llama-2-7B. We show that this sub-circuit has effects on various math-related prompts, such as on intervaled circuits, Spanish number word and months continuation, and natural language word problems.
arXiv Detail & Related papers (2023-11-07T16:58:51Z)
A Meta-Learning Perspective on Transformers for Causal Language Modeling [17.293733942245154]
The Transformer architecture has become prominent in developing large causal language models. We establish a meta-learning view of the Transformer architecture when trained for the causal language modeling task. Within the inner optimization, we discover and theoretically analyze a special characteristic of the norms of learned token representations within Transformer-based causal language models.
arXiv Detail & Related papers (2023-10-09T17:27:36Z)
Mapping of attention mechanisms to a generalized Potts model [50.91742043564049]
We show that training a neural network is exactly equivalent to solving the inverse Potts problem by the so-called pseudo-likelihood method. We also compute the generalization error of self-attention in a model scenario analytically using the replica method.
arXiv Detail & Related papers (2023-04-14T16:32:56Z)
Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure. The second method imbues structure at the data level by segmenting the data with morphological tokenization. We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z)
Incorporating Residual and Normalization Layers into Analysis of Masked Language Models [29.828669678974983]
We extend the scope of the analysis of Transformers from solely the attention patterns to the whole attention block. Our analysis of Transformer-based masked language models shows that the token-to-token interaction performed via attention has less impact on the intermediate representations than previously assumed.
arXiv Detail & Related papers (2021-09-15T08:32:20Z)
Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language. We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer. We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.