Related papers: Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning

Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning

URL: http://arxiv.org/abs/2411.05037v1
Date: Wed, 06 Nov 2024 16:30:26 GMT
Title: Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning
Authors: Mansi Sakarvadia,
Abstract summary: Language models (LMs) struggle to perform multi-hop reasoning consistently. We propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LM attention heads.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Language models (LMs) struggle to perform such reasoning consistently. We propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single- and multi-hop prompts. We then propose a mechanism that allows users to inject relevant prompt-specific information, which we refer to as "memories," at critical LM locations during inference. By thus enabling the LM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We empirically show that a simple, efficient, and targeted memory injection into a key attention layer often increases the probability of the desired next token in multi-hop tasks, by up to 424%. We observe that small subsets of attention heads can significantly impact the model prediction during multi-hop reasoning. To more faithfully interpret these heads, we develop Attention Lens: an open source tool that translates the outputs of attention heads into vocabulary tokens via learned transformations called lenses. We demonstrate the use of lenses to reveal how a model arrives at its answer and use them to localize sources of model failures such as in the case of biased and malicious language generation.

Related papers

Masking in Multi-hop QA: An Analysis of How Language Models Perform with Context Permutation [56.69064935192318]
Multi-hop Question Answering (MHQA) adds layers of complexity to question answering, making it more challenging.<n>This paper explores how Language Models respond to multi-hop questions by permuting search results (retrieved documents) under various configurations.
arXiv Detail & Related papers (2025-05-16T23:29:47Z)
Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models [51.53835083483751]
We investigate how large language models perform latent multi-hop reasoning in prompts like "Wolfgang Amadeus Mozart's mother's spouse is" We find that failures often stem from the relation attribute extraction stage, where conflicting logits reduce prediction accuracy. We propose back attention, a novel mechanism that enables lower layers to leverage higher-layer hidden states from different positions during attention computation.
arXiv Detail & Related papers (2025-02-15T15:36:42Z)
Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large Language Models Attentive Readers? [6.525065859315515]
We investigate whether Large Language Models (LLMs) are prone to exploiting simplifying cues in multi-hop reasoning benchmarks. Motivated by this finding, we propose a challenging multi-hop reasoning benchmark, by generating seemingly plausible multi-hop reasoning chains. We find that their performance to perform multi-hop reasoning is affected, as indicated by up to 45% relative decrease in F1 score when presented with such seemingly plausible alternatives.
arXiv Detail & Related papers (2024-09-08T19:22:58Z)
LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch. Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process. By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z)
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We introduce an extended concept of memorization, distributional memorization, which measures the correlation between the output probabilities and the pretraining data frequency. This study demonstrates that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks.
arXiv Detail & Related papers (2024-07-20T21:24:40Z)
Understanding Information Storage and Transfer in Multi-modal Large Language Models [51.20840103605018]
We study how Multi-modal Large Language Models process information in a factual visual question answering task. Key findings show that these MLLMs rely on self-attention blocks in much earlier layers for information storage. We introduce MultEdit, a model-editing algorithm that can correct errors and insert new long-tailed information into MLLMs.
arXiv Detail & Related papers (2024-06-06T16:35:36Z)
Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models [4.343604069244352]
We propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on attention heads. We show that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%.
arXiv Detail & Related papers (2023-09-11T16:39:30Z)
Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers [54.4919139401528]
We show that it is possible to reduce interference by identifying and pruning language-specific parameters. We show that removing identified attention heads from a fixed model improves performance for a target language on both sentence classification and structural prediction.
arXiv Detail & Related papers (2022-10-11T18:11:37Z)
Locate Then Ask: Interpretable Stepwise Reasoning for Multi-hop Question Answering [71.49131159045811]
Multi-hop reasoning requires aggregating multiple documents to answer a complex question. Existing methods usually decompose the multi-hop question into simpler single-hop questions. We propose an interpretable stepwise reasoning framework to incorporate both single-hop supporting sentence identification and single-hop question generation.
arXiv Detail & Related papers (2022-08-22T13:24:25Z)
Few-Shot Stance Detection via Target-Aware Prompt Distillation [48.40269795901453]
This paper is inspired by the potential capability of pre-trained language models (PLMs) serving as knowledge bases and few-shot learners. PLMs can provide essential contextual information for the targets and enable few-shot learning via prompts. Considering the crucial role of the target in stance detection task, we design target-aware prompts and propose a novel verbalizer.
arXiv Detail & Related papers (2022-06-27T12:04:14Z)
Focus-Constrained Attention Mechanism for CVAE-based Response Generation [27.701626908931267]
latent variable is supposed to capture the discourse-level information and encourage the informativeness of target responses. We transform the coarse-grained discourse-level information into fine-grained word-level information. Our model can generate more diverse and informative responses compared with several state-of-the-art models.
arXiv Detail & Related papers (2020-09-25T09:38:59Z)
Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering [35.40919477319811]
We propose a novel knowledge-aware approach that equips pre-trained language models with a multi-hop relational reasoning module. It performs multi-hop, multi-relational reasoning over subgraphs extracted from external knowledge graphs. It unifies path-based reasoning methods and graph neural networks to achieve better interpretability and scalability.
arXiv Detail & Related papers (2020-05-01T23:10:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.