Related papers: Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

URL: http://arxiv.org/abs/2309.05605v3
Date: Wed, 28 Feb 2024 21:00:13 GMT
Title: Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models
Authors: Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Daniel Grzenda, Nathaniel Hudson, Andr\'e Bauer, Kyle Chard, Ian Foster
Abstract summary: We propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on attention heads. We show that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%.
Score: 4.343604069244352
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Large Language Models (LLMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LLM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single and multi-hop prompts. We then propose a mechanism that allows users to inject pertinent prompt-specific information, which we refer to as "memories," at critical LLM locations during inference. By thus enabling the LLM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We show empirically that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%.

Related papers

An Analysis of Decoding Methods for LLM-based Agents for Faithful Multi-Hop Question Answering [44.41915467956464]
Large Language Models (LLMs) frequently produce factually inaccurate outputs. This phenomenon limits their accuracy in knowledge-intensive NLP tasks. Recent research has explored training-free decoding strategies to improve the faithfulness of model generations.
arXiv Detail & Related papers (2025-03-30T12:18:21Z)
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation [52.35744453954844]
This paper introduces MMRC, a benchmark for evaluating six core open-ended abilities of MLLMs. Evaluations on 20 MLLMs in MMRC indicate an accuracy drop during open-ended interactions. We propose a simple yet effective NOTE-TAKING strategy, which can record key information from the conversation and remind the model during its responses.
arXiv Detail & Related papers (2025-02-17T15:24:49Z)
Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models [51.53835083483751]
We investigate how large language models perform latent multi-hop reasoning in prompts like "Wolfgang Amadeus Mozart's mother's spouse is" We find that failures often stem from the relation attribute extraction stage, where conflicting logits reduce prediction accuracy. We propose back attention, a novel mechanism that enables lower layers to leverage higher-layer hidden states from different positions during attention computation.
arXiv Detail & Related papers (2025-02-15T15:36:42Z)
An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism [14.479060028732803]
We argue that the current methods of multi-modal multi-hop question answering still mainly face two challenges. The retrieved evidence containing a large amount of redundant information leads to a significant drop in performance. The reasoning process without interpretable reasoning steps makes the model difficult to discover the logical errors for handling complex questions.
arXiv Detail & Related papers (2024-12-08T05:47:55Z)
Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning [0.0]
Language models (LMs) struggle to perform multi-hop reasoning consistently. We propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LM attention heads.
arXiv Detail & Related papers (2024-11-06T16:30:26Z)
Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning [38.03304773600225]
Large language models (LLMs) serve as giant information stores, often including personal or copyrighted data, and retraining them from scratch is not a viable option. We propose MUNCH, a simple uncertainty-based approach that breaks down multi-hop queries into subquestions and leverages the uncertainty of the unlearned model in final decision-making.
arXiv Detail & Related papers (2024-10-17T07:00:15Z)
Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering [108.2131720470005]
Large language models (LLMs) have demonstrated remarkable performance across various real-world tasks. They often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or hallucinated. We propose AutoPASTA, a method that automatically identifies key contextual information and explicitly highlights it by steering an LLM's attention scores.
arXiv Detail & Related papers (2024-09-16T23:52:41Z)
Understanding Information Storage and Transfer in Multi-modal Large Language Models [51.20840103605018]
We study how Multi-modal Large Language Models process information in a factual visual question answering task. Key findings show that these MLLMs rely on self-attention blocks in much earlier layers for information storage. We introduce MultEdit, a model-editing algorithm that can correct errors and insert new long-tailed information into MLLMs.
arXiv Detail & Related papers (2024-06-06T16:35:36Z)
Uncertainty Guided Global Memory Improves Multi-Hop Question Answering [3.7013865226473848]
We propose a two-stage method that first collects relevant information over the entire document to the memory and then combines it with local context to solve the task. Our experimental results show that fine-tuning a pre-trained model with memory-augmented input, including the most certain global elements, improves the model's performance.
arXiv Detail & Related papers (2023-11-29T23:45:57Z)
Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning [70.74928578278957]
In open-domain question-answering (ODQA), most existing questions require single-hop reasoning on commonsense. Large language models (LLMs) have found significant utility in facilitating ODQA without external corpus. We propose Self-prompted Chain-of-Thought (SP-CoT), an automated framework to mass-produce high quality CoTs.
arXiv Detail & Related papers (2023-10-20T14:51:10Z)
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation [92.43001160060376]
We study the factuality of large language models (LLMs) in the context of answering questions that test current world knowledge. We introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types. We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination. Motivated by these results, we present FreshPrompt, a simple few-shot prompting method that substantially boosts the performance of an LLM on FreshQA.
arXiv Detail & Related papers (2023-10-05T00:04:12Z)
Triggering Multi-Hop Reasoning for Question Answering in Language Models using Soft Prompts and Random Walks [1.5254598796939924]
We propose techniques that improve upon this limitation by relying on random walks over structured knowledge graphs. Specifically, we use soft prompts to guide LMs to chain together their encoded knowledge by learning to map multi-hop questions to random walk paths that lead to the answer. Applying our methods on two T5 LMs shows substantial improvements over standard tuning approaches in answering questions that require 2-hop reasoning.
arXiv Detail & Related papers (2023-06-06T20:45:18Z)
Rethinking Label Smoothing on Multi-hop Question Answering [87.68071401870283]
Multi-Hop Question Answering (MHQA) is a significant area in question answering. In this work, we analyze the primary factors limiting the performance of multi-hop reasoning. We propose a novel label smoothing technique, F1 Smoothing, which incorporates uncertainty into the learning process.
arXiv Detail & Related papers (2022-12-19T14:48:08Z)
Locate Then Ask: Interpretable Stepwise Reasoning for Multi-hop Question Answering [71.49131159045811]
Multi-hop reasoning requires aggregating multiple documents to answer a complex question. Existing methods usually decompose the multi-hop question into simpler single-hop questions. We propose an interpretable stepwise reasoning framework to incorporate both single-hop supporting sentence identification and single-hop question generation.
arXiv Detail & Related papers (2022-08-22T13:24:25Z)
KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive Question Answering [28.18555591429343]
We propose a novel framework named Knowledge Enhanced Contrastive Prompt-tuning (KECP) Instead of adding pointer heads to PLMs, we transform the task into a non-autoregressive Masked Language Modeling (MLM) generation problem. Our method consistently outperforms state-of-the-art approaches in few-shot settings by a large margin.
arXiv Detail & Related papers (2022-05-06T08:31:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.