Related papers: Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

URL: http://arxiv.org/abs/2406.09519v1
Date: Thu, 13 Jun 2024 18:12:01 GMT
Title: Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Authors: Jack Merullo, Carsten Eickhoff, Ellie Pavlick,
Abstract summary: We find that transformer language models (LMs) pass features from early layers to later layers. By analyzing particular mechanism LMs use to accomplish this, we find that it is also used to recall items from a list. Our analysis reveals a surprisingly intricate interpretable structure learned from language model pretraining.
Score: 32.2976613483151
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Although it is known that transformer language models (LMs) pass features from early layers to later layers, it is not well understood how this information is represented and routed by the model. By analyzing particular mechanism LMs use to accomplish this, we find that it is also used to recall items from a list, and show that this mechanism can explain an otherwise arbitrary-seeming sensitivity of the model to the order of items in the prompt. Specifically, we find that models write into low-rank subspaces of the residual stream to represent features which are then read out by specific later layers, forming low-rank communication channels between layers. By decomposing attention head weight matrices with the Singular Value Decomposition (SVD), we find that previously described interactions between heads separated by one or more layers can be predicted via analysis of their weight matrices. We show that it is possible to manipulate the internal model representations as well as edit model weights based on the mechanism we discover in order to significantly improve performance on our synthetic Laundry List task, which requires recall from a list, often improving task accuracy by over 20%. Our analysis reveals a surprisingly intricate interpretable structure learned from language model pretraining, and helps us understand why sophisticated LMs sometimes fail in simple domains, facilitating future analysis of more complex behaviors.

Related papers

Multiple Streams of Relation Extraction: Enriching and Recalling in Transformers [9.901842773988946]
We show that fine-tuned language models both extract relation information learned during finetuning while processing entities and (2) recall" this information in later layers while generating predictions.<n>We examine the necessity and sufficiency of these information pathways, examining what layers they occur at, how much redundancy they exhibit, and which model components are involved.
arXiv Detail & Related papers (2025-06-25T18:13:34Z)
Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models [5.317065202153858]
We investigate how 25 models represent lexical identity and inflectional morphology across six typologically diverse languages.<n>We find that models concentrate lexical information linearly in early layers and increasingly nonlinearly in later layers.<n>Remarkably, these encoding patterns emerge across all models we test, despite differences in architecture, size, and training regime.
arXiv Detail & Related papers (2025-06-02T18:01:56Z)
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [54.59207567677249]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z)
(How) Do Language Models Track State? [50.516691979518164]
Transformer language models (LMs) exhibit behaviors that appear to require tracking the unobserved state of an evolving world. We study state tracking in LMs trained or fine-tuned to compose permutations. We show that LMs consistently learn one of two state tracking mechanisms for this task.
arXiv Detail & Related papers (2025-03-04T18:31:02Z)
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration [49.180693704510006]
Referring Expression (REC) is a cross-modal task that evaluates the interplay of language understanding, image comprehension, and language-to-image grounding.<n>It serves as an essential testing ground for Multimodal Large Language Models (MLLMs)
arXiv Detail & Related papers (2025-02-27T13:58:44Z)
The Complexity of Learning Sparse Superposed Features with Feedback [0.9838799448847586]
We investigate whether the underlying learned features of a model can be efficiently retrieved through feedback from an agent. We analyze the feedback complexity associated with learning a feature matrix in sparse settings. Our results establish tight bounds when the agent is permitted to construct activations and demonstrate strong upper bounds in sparse scenarios.
arXiv Detail & Related papers (2025-02-08T01:54:23Z)
Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial. In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations. We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z)
Two are better than one: Context window extension with multi-grained self-injection [111.1376461868317]
SharedLLM is a novel approach grounded in the design philosophy of multi-grained context compression and query-aware information retrieval. We introduce a specialized tree-style data structure to efficiently encode, store and retrieve multi-grained contextual information for text chunks.
arXiv Detail & Related papers (2024-10-25T06:08:59Z)
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model [3.838217057990932]
Injectable Realignment Model (IRM) is a novel approach to language model interpretability and explainability. Inspired by earlier work on Neural Programming Interfaces, we construct and train a small network -- the IRM -- to induce emotion-based alignments. Analysis of the trained IRM's outputs reveals a curious pattern.
arXiv Detail & Related papers (2024-07-04T04:05:19Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL) We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z)
Language Models Implement Simple Word2Vec-style Vector Arithmetic [32.2976613483151]
A primary criticism towards language models (LMs) is their inscrutability. This paper presents evidence that, despite their size and complexity, LMs sometimes exploit a simple vector arithmetic style mechanism to solve some relational tasks.
arXiv Detail & Related papers (2023-05-25T15:04:01Z)
Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge. We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences. We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z)
VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers [45.42482446288144]
Recent advances in interpretability suggest we can project weights and hidden states of transformer-based language models to their vocabulary. We investigate LM attention heads and memory values, the vectors the models dynamically create and recall while processing a given input. We create a tool to visualize a forward pass of Generative Pre-trained Transformers (GPTs) as an interactive flow graph.
arXiv Detail & Related papers (2023-05-22T19:04:56Z)
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions. Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z)
ELUDE: Generating interpretable explanations via a decomposition into labelled and unlabelled features [23.384134043048807]
We develop an explanation framework that decomposes a model's prediction into two parts. By identifying the latter, we are able to analyze the "unexplained" portion of the model. We show that the set of unlabelled features can generalize to multiple models trained with the same feature space.
arXiv Detail & Related papers (2022-06-15T17:36:55Z)
Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training [86.91380874390778]
We present Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data. Based on experimental results, neural semantics that leverage GAP MODEL obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-generative benchmarks.
arXiv Detail & Related papers (2020-12-18T15:53:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.