Attention Lens: A Tool for Mechanistically Interpreting the Attention
Head Information Retrieval Mechanism
- URL: http://arxiv.org/abs/2310.16270v1
- Date: Wed, 25 Oct 2023 01:03:35 GMT
- Title: Attention Lens: A Tool for Mechanistically Interpreting the Attention
Head Information Retrieval Mechanism
- Authors: Mansi Sakarvadia, Arham Khan, Aswathy Ajith, Daniel Grzenda, Nathaniel
Hudson, Andr\'e Bauer, Kyle Chard, Ian Foster
- Abstract summary: We propose Attention Lens, a tool that enables researchers to translate the outputs of attention heads into vocabulary tokens.
Preliminary findings from our trained lenses indicate that attention heads play highly specialized roles in language models.
- Score: 4.343604069244352
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based Large Language Models (LLMs) are the state-of-the-art for
natural language tasks. Recent work has attempted to decode, by reverse
engineering the role of linear layers, the internal mechanisms by which LLMs
arrive at their final predictions for text completion tasks. Yet little is
known about the specific role of attention heads in producing the final token
prediction. We propose Attention Lens, a tool that enables researchers to
translate the outputs of attention heads into vocabulary tokens via learned
attention-head-specific transformations called lenses. Preliminary findings
from our trained lenses indicate that attention heads play highly specialized
roles in language models. The code for Attention Lens is available at
github.com/msakarvadia/AttentionLens.
Related papers
- Disentangling meaning from language in LLM-based machine translation [24.40574806667368]
We study sentence-level Machine Translation from a mechanistic perspective.<n>We decompose MT into two subtasks: producing text in the target language and preserving the input sentence's meaning.<n>We show that modifying just 1% of the relevant heads enables instruction-free MT performance comparable to instruction-based prompting.
arXiv Detail & Related papers (2026-02-04T14:40:53Z) - Finding the Translation Switch: Discovering and Exploiting the Task-Initiation Features in LLMs [69.28193153685893]
Large Language Models (LLMs) frequently exhibit strong translation abilities, even without task-specific fine-tuning.<n>To demystify this process, we leverage Sparse Autoencoders (SAEs) and introduce a novel framework for identifying task-specific features.<n>Our work not only decodes a core component of the translation mechanism in LLMs but also provides a blueprint for using internal model mechanism to create more robust and efficient models.
arXiv Detail & Related papers (2026-01-16T06:29:07Z) - Head Pursuit: Probing Attention Specialization in Multimodal Transformers [32.218423952797444]
We study how individual attention heads in text-generative models specialize in specific semantic or visual attributes.<n>Our results show consistent patterns of specialization at the head level across both unimodal and multimodal transformers.<n>Remarkably, we find that editing as few as 1% of the heads, selected using our method, can reliably suppress or enhance targeted concepts in the model output.
arXiv Detail & Related papers (2025-10-24T14:41:47Z) - Visual Jigsaw Post-Training Improves MLLMs [58.29961336087896]
We introduce Visual Jigsaw, a generic self-supervised post-training framework designed to strengthen visual understanding in large language models (MLLMs)<n>Visual Jigsaw is formulated as a general ordering task: visual inputs are partitioned, shuffled, and the model must reconstruct the visual information by producing the correct permutation in natural language.<n>Extensive experiments demonstrate substantial improvements in fine-grained perception, temporal reasoning, and 3D spatial understanding.
arXiv Detail & Related papers (2025-09-29T17:59:57Z) - OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction [95.6266030753644]
Vision-Language-Action (VLA) models aim to predict robotic actions based on visual observations and language instructions.
Existing approaches require fine-tuning pre-trained vision-language models (VLMs) as visual and language features are independently fed into downstream policies.
We propose OTTER, a novel VLA architecture that leverages existing alignments through explicit, text-aware visual feature extraction.
arXiv Detail & Related papers (2025-03-05T18:44:48Z) - ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding [67.63933036920012]
Existing methods, including proxy encoding and geometry encoding, incorporate additional syntax to encode the object's location.
This study presents ClawMachine, offering a new methodology that notates an entity directly using the visual tokens.
ClawMachine unifies visual referring and grounding into an auto-regressive format and learns with a decoder-only architecture.
arXiv Detail & Related papers (2024-06-17T08:39:16Z) - Picking the Underused Heads: A Network Pruning Perspective of Attention
Head Selection for Fusing Dialogue Coreference Information [50.41829484199252]
Transformer-based models with the multi-head self-attention mechanism are widely used in natural language processing.
We investigate the attention head selection and manipulation strategy for feature injection from a network pruning perspective.
arXiv Detail & Related papers (2023-12-15T05:27:24Z) - Naturalness of Attention: Revisiting Attention in Code Language Models [3.756550107432323]
Language models for code such as CodeBERT offer the capability to learn advanced source code representation, but their opacity poses barriers to understanding of captured properties.
This study aims to shed some light on the previously ignored factors of the attention mechanism beyond the attention weights.
arXiv Detail & Related papers (2023-11-22T16:34:12Z) - Frozen Transformers in Language Models Are Effective Visual Encoder Layers [26.759544759745648]
Large language models (LLMs) are surprisingly strong encoders for purely visual tasks in the absence of language.
Our work pushes the boundaries of leveraging LLMs for computer vision tasks.
We propose the information filtering hypothesis to explain the effectiveness of pre-trained LLMs in visual encoding.
arXiv Detail & Related papers (2023-10-19T17:59:05Z) - Towards Vision-Language Mechanistic Interpretability: A Causal Tracing
Tool for BLIP [27.51318030253248]
We adapt a unimodal causal tracing tool to BLIP to enable the study of the neural mechanisms underlying image-conditioned text generation.
We release our BLIP causal tracing tool as open source to enable further experimentation in vision-language mechanistic interpretability.
arXiv Detail & Related papers (2023-08-27T18:46:47Z) - VisionLLM: Large Language Model is also an Open-Ended Decoder for
Vision-Centric Tasks [81.32968995346775]
VisionLLM is a framework for vision-centric tasks that can be flexibly defined and managed using language instructions.
Our model can achieve over 60% mAP on COCO, on par with detection-specific models.
arXiv Detail & Related papers (2023-05-18T17:59:42Z) - Shapley Head Pruning: Identifying and Removing Interference in
Multilingual Transformers [54.4919139401528]
We show that it is possible to reduce interference by identifying and pruning language-specific parameters.
We show that removing identified attention heads from a fixed model improves performance for a target language on both sentence classification and structural prediction.
arXiv Detail & Related papers (2022-10-11T18:11:37Z) - Verb Knowledge Injection for Multilingual Event Processing [50.27826310460763]
We investigate whether injecting explicit information on verbs' semantic-syntactic behaviour improves the performance of LM-pretrained Transformers.
We first demonstrate that injecting verb knowledge leads to performance gains in English event extraction.
We then explore the utility of verb adapters for event extraction in other languages.
arXiv Detail & Related papers (2020-12-31T03:24:34Z) - Multi-Head Self-Attention with Role-Guided Masks [20.955992710112216]
We propose a method to guide the attention heads towards roles identified in prior work as important.
We do this by defining role-specific masks to constrain the heads to attend to specific parts of the input.
Experiments on text classification and machine translation using 7 different datasets show that our method outperforms competitive attention-based, CNN, and RNN baselines.
arXiv Detail & Related papers (2020-12-22T21:34:02Z) - Multi-Head Attention: Collaborate Instead of Concatenate [85.71058762269374]
We propose a collaborative multi-head attention layer that enables heads to learn shared projections.
Experiments confirm that sharing key/query dimensions can be exploited in language understanding, machine translation and vision.
arXiv Detail & Related papers (2020-06-29T20:28:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.