Related papers: Inferring Functionality of Attention Heads from their Parameters

Inferring Functionality of Attention Heads from their Parameters

URL: http://arxiv.org/abs/2412.11965v1
Date: Mon, 16 Dec 2024 16:45:33 GMT
Title: Inferring Functionality of Attention Heads from their Parameters
Authors: Amit Elhelo, Mor Geva,
Abstract summary: We propose a framework that infers the functionality of attention heads from their parameters, without any model training or inference.<n>We evaluate MAPS on 20 operations across 6 popular large language models (LLMs)<n>Our pipeline produces plausible operation descriptions for most heads, as assessed by human judgment, while revealing diverse operations.
Score: 12.913172023910203
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Attention heads are one of the building blocks of large language models (LLMs). Prior work on investigating their operation mostly focused on analyzing their behavior during inference for specific circuits or tasks. In this work, we seek a comprehensive mapping of the operations they implement in a model. We propose MAPS (Mapping Attention head ParameterS), an efficient framework that infers the functionality of attention heads from their parameters, without any model training or inference. We showcase the utility of MAPS for answering two types of questions: (a) given a predefined operation, mapping how strongly heads across the model implement it, and (b) given an attention head, inferring its salient functionality. Evaluating MAPS on 20 operations across 6 popular LLMs shows its estimations correlate with the head's outputs during inference and are causally linked to the model's predictions. Moreover, its mappings reveal attention heads of certain operations that were overlooked in previous studies, and valuable insights on function universality and architecture biases in LLMs. Next, we present an automatic pipeline and analysis that leverage MAPS to characterize the salient operations of a given head. Our pipeline produces plausible operation descriptions for most heads, as assessed by human judgment, while revealing diverse operations.

Related papers

MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning [57.42710816140401]
A promising approach uses code as an intermediate representation to precisely express and manipulate the images in the reasoning steps.<n>Existing evaluations focus mainly on text-only reasoning outputs, leaving the MLLM's ability to perform accurate visual operations via code largely unexplored.<n>This work takes a first step toward addressing that gap by evaluating MLLM's code-based capabilities in multi-modal mathematical reasoning.
arXiv Detail & Related papers (2025-07-24T07:03:11Z)
SAS: Simulated Attention Score [75.1409882298863]
We introduce Simulated Attention Score (SAS), which maintains a compact model size while simulating a larger number of attention heads and hidden feature dimension per head.<n>Comprehensive experiments on a variety of datasets and tasks demonstrate the effectiveness of the proposed SAS method.
arXiv Detail & Related papers (2025-07-10T12:16:16Z)
DEAL: Disentangling Transformer Head Activations for LLM Steering [19.770342907146965]
We propose a principled causal-attribution framework for identifying behavior-relevant attention heads in transformers.<n>For each head, we train a vector-quantized autoencoder (VQ-AE) on its attention activations.<n>We assess the behavioral relevance of each head by the separability of VQ-AE encodings for behavior-aligned versus behavior-violating responses.
arXiv Detail & Related papers (2025-06-10T02:16:50Z)
A Cooperative Multi-Agent Framework for Zero-Shot Named Entity Recognition [71.61103962200666]
Zero-shot named entity recognition (NER) aims to develop entity recognition systems from unannotated text corpora. Recent work has adapted large language models (LLMs) for zero-shot NER by crafting specialized prompt templates. We introduce the cooperative multi-agent system (CMAS), a novel framework for zero-shot NER.
arXiv Detail & Related papers (2025-02-25T23:30:43Z)
Do Influence Functions Work on Large Language Models? [10.463762448166714]
Influence functions aim to quantify the impact of individual training data points on a model's predictions. We evaluate influence functions across multiple tasks and find that they consistently perform poorly in most settings.
arXiv Detail & Related papers (2024-09-30T06:50:18Z)
Attention Heads of Large Language Models: A Survey [10.136767972375639]
This study aims to shed light on the internal reasoning processes of Large Language Models (LLMs) by concentrating on the underlying mechanisms of attention heads. We first distill the human thought process into a four-stage framework. Using this framework, we systematically review existing research to identify and categorize the functions of specific attention heads.
arXiv Detail & Related papers (2024-09-05T17:59:12Z)
Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications. In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations. We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z)
Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts. We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z)
Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) are used to automate decision-making tasks. In this paper, we evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types. These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts.
arXiv Detail & Related papers (2024-04-08T14:15:56Z)
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis [128.0532113800092]
We present a mechanistic interpretation of Transformer-based LMs on arithmetic questions. This provides insights into how information related to arithmetic is processed by LMs.
arXiv Detail & Related papers (2023-05-24T11:43:47Z)
Post Hoc Explanations of Language Models Can Improve Language Models [43.2109029463221]
We present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY) We leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions. Our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks.
arXiv Detail & Related papers (2023-05-19T04:46:04Z)
Learning Operators with Coupled Attention [9.715465024071333]
We propose a novel operator learning method, LOCA, motivated from the recent success of the attention mechanism. In our architecture the input functions are mapped to a finite set of features which are then averaged with attention weights that depend on the output query locations. By coupling these attention weights together with an integral transform, LOCA is able to explicitly learn correlations in the target output functions.
arXiv Detail & Related papers (2022-01-04T08:22:03Z)
Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models [76.48370548802464]
This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final MRC system performance. We discover that passage-to-question and passage understanding attentions are the most important ones in the question answering process. Through comprehensive visualizations and case studies, we also observe several general findings on the attention maps, which can be helpful to understand how these models solve the questions.
arXiv Detail & Related papers (2021-08-26T04:23:57Z)
The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT [18.13834903235249]
Multi-headed attention heads are a mainstay in transformer-based models. Different methods have been proposed to classify the role of each attention head based on the relations between tokens which have high pair-wise attention. We formalize a simple yet effective score that generalizes to all the roles of attention heads and employs hypothesis testing on this score for robust inference.
arXiv Detail & Related papers (2021-01-22T14:10:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.