Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries
- URL: http://arxiv.org/abs/2406.12775v2
- Date: Mon, 14 Oct 2024 09:55:12 GMT
- Title: Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries
- Authors: Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, Amir Globerson,
- Abstract summary: We study how large language models (LLMs) solve complex multi-step problems.
understanding how the latent step is computed internally is key to understanding the overall computation.
We propose a novel "back-patching" analysis method whereby a hidden representation from a later layer is patched back to an earlier layer.
- Score: 39.438904598467154
- License:
- Abstract: Large language models (LLMs) can solve complex multi-step problems, but little is known about how these computations are implemented internally. Motivated by this, we study how LLMs answer multi-hop queries such as "The spouse of the performer of Imagine is". These queries require two information extraction steps: a latent one for resolving the first hop ("the performer of Imagine") into the bridge entity (John Lennon), and another for resolving the second hop ("the spouse of John Lennon") into the target entity (Yoko Ono). Understanding how the latent step is computed internally is key to understanding the overall computation. By carefully analyzing the internal computations of transformer-based LLMs, we discover that the bridge entity is resolved in the early layers of the model. Then, only after this resolution, the two-hop query is solved in the later layers. Because the second hop commences in later layers, there could be cases where these layers no longer encode the necessary knowledge for correctly predicting the answer. Motivated by this, we propose a novel "back-patching" analysis method whereby a hidden representation from a later layer is patched back to an earlier layer. We find that in up to 66% of previously incorrect cases there exists a back-patch that results in the correct generation of the answer, showing that the later layers indeed sometimes lack the needed functionality. Overall, our methods and findings open further opportunities for understanding and improving latent reasoning in transformer-based LLMs.
Related papers
- GenSco: Can Question Decomposition based Passage Alignment improve Question Answering? [1.5776201492893507]
"GenSco" is a novel approach of selecting passages based on the predicted decomposition of the multi-hop questions.
We evaluate on three broadly established multi-hop question answering datasets.
arXiv Detail & Related papers (2024-07-14T15:25:08Z) - Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering [45.82437926569949]
Multi-Hop Question Answering tasks present a significant challenge for large language models.
We introduce a novel generate-then-ground (GenGround) framework to solve a multi-hop question.
arXiv Detail & Related papers (2024-06-21T06:26:38Z) - Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering [18.94220625114711]
Large language models (LLMs) perform surprisingly well and outperform human experts on many tasks.
This paper integrates and optimized a pipeline for selecting reasoning paths from KG based on LLM.
We also propose a simple and effective subgraph retrieval method based on chain of thought (CoT) and page rank.
arXiv Detail & Related papers (2024-04-16T08:28:16Z) - Look Before You Leap: A Universal Emergent Decomposition of Retrieval
Tasks in Language Models [58.57279229066477]
We study how language models (LMs) solve retrieval tasks in diverse situations.
We introduce ORION, a collection of structured retrieval tasks spanning six domains.
We find that LMs internally decompose retrieval tasks in a modular way.
arXiv Detail & Related papers (2023-12-13T18:36:43Z) - Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and
Layers [73.28459749681879]
This paper focuses on LLaMA, a prominent open-source foundational model in natural language processing.
Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding.
We unveil several key and uncommon findings based on the designed probing tasks.
arXiv Detail & Related papers (2023-12-07T14:50:41Z) - Towards a Mechanistic Interpretation of Multi-Step Reasoning
Capabilities of Language Models [107.07851578154242]
Language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities.
It is unclear whether LMs perform tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism.
We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples.
arXiv Detail & Related papers (2023-10-23T01:47:29Z) - LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text.
This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion.
We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z) - Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation.
PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions.
Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.