Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models
- URL: http://arxiv.org/abs/2502.10835v1
- Date: Sat, 15 Feb 2025 15:36:42 GMT
- Title: Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models
- Authors: Zeping Yu, Yonatan Belinkov, Sophia Ananiadou,
- Abstract summary: We investigate how large language models perform latent multi-hop reasoning in prompts like "Wolfgang Amadeus Mozart's mother's spouse is"
We find that failures often stem from the relation attribute extraction stage, where conflicting logits reduce prediction accuracy.
We propose back attention, a novel mechanism that enables lower layers to leverage higher-layer hidden states from different positions during attention computation.
- Score: 51.53835083483751
- License:
- Abstract: We investigate how large language models perform latent multi-hop reasoning in prompts like "Wolfgang Amadeus Mozart's mother's spouse is". To analyze this process, we introduce logit flow, an interpretability method that traces how logits propagate across layers and positions toward the final prediction. Using logit flow, we identify four distinct stages in single-hop knowledge prediction: (A) entity subject enrichment, (B) entity attribute extraction, (C) relation subject enrichment, and (D) relation attribute extraction. Extending this analysis to multi-hop reasoning, we find that failures often stem from the relation attribute extraction stage, where conflicting logits reduce prediction accuracy. To address this, we propose back attention, a novel mechanism that enables lower layers to leverage higher-layer hidden states from different positions during attention computation. With back attention, a 1-layer transformer achieves the performance of a 2-layer transformer. Applied to four LLMs, back attention improves accuracy on five reasoning datasets, demonstrating its effectiveness in enhancing latent multi-hop reasoning ability.
Related papers
- Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning [0.0]
Language models (LMs) struggle to perform multi-hop reasoning consistently.
We propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LM attention heads.
arXiv Detail & Related papers (2024-11-06T16:30:26Z) - Improve Vision Language Model Chain-of-thought Reasoning [86.83335752119741]
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness.
We show that training VLM on short answers does not generalize well to reasoning tasks that require more detailed responses.
arXiv Detail & Related papers (2024-10-21T17:00:06Z) - Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large Language Models Attentive Readers? [6.525065859315515]
We investigate whether Large Language Models (LLMs) are prone to exploiting simplifying cues in multi-hop reasoning benchmarks.
Motivated by this finding, we propose a challenging multi-hop reasoning benchmark, by generating seemingly plausible multi-hop reasoning chains.
We find that their performance to perform multi-hop reasoning is affected, as indicated by up to 45% relative decrease in F1 score when presented with such seemingly plausible alternatives.
arXiv Detail & Related papers (2024-09-08T19:22:58Z) - Towards a Mechanistic Interpretation of Multi-Step Reasoning
Capabilities of Language Models [107.07851578154242]
Language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities.
It is unclear whether LMs perform tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism.
We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples.
arXiv Detail & Related papers (2023-10-23T01:47:29Z) - In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent.
For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z) - Memory Injections: Correcting Multi-Hop Reasoning Failures during
Inference in Transformer-Based Language Models [4.343604069244352]
We propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on attention heads.
We show that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%.
arXiv Detail & Related papers (2023-09-11T16:39:30Z) - Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task.
LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced.
Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z) - Exposing Attention Glitches with Flip-Flop Language Modeling [55.0688535574859]
This work identifies and analyzes the phenomenon of attention glitches in large language models.
We introduce flip-flop language modeling (FFLM), a family of synthetic benchmarks designed to probe the extrapolative behavior of neural language models.
We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques.
arXiv Detail & Related papers (2023-06-01T17:44:35Z) - Consistent Multi-Granular Rationale Extraction for Explainable Multi-hop
Fact Verification [13.72453491358488]
This paper explores the viability of multi-granular rationale extraction with consistency and faithfulness for explainable multi-hop fact verification.
In particular, given a pretrained veracity prediction model, both the token-level explainer and sentence-level explainer are trained simultaneously to obtain multi-granular rationales.
Experimental results on three multi-hop fact verification datasets show that the proposed approach outperforms some state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-16T12:31:53Z) - Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question
Answering [35.40919477319811]
We propose a novel knowledge-aware approach that equips pre-trained language models with a multi-hop relational reasoning module.
It performs multi-hop, multi-relational reasoning over subgraphs extracted from external knowledge graphs.
It unifies path-based reasoning methods and graph neural networks to achieve better interpretability and scalability.
arXiv Detail & Related papers (2020-05-01T23:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.