Related papers: Layer-Order Inversion: Rethinking Latent Multi-Hop Reasoning in Large Language Models

Layer-Order Inversion: Rethinking Latent Multi-Hop Reasoning in Large Language Models

URL: http://arxiv.org/abs/2601.03542v1
Date: Wed, 07 Jan 2026 03:13:03 GMT
Title: Layer-Order Inversion: Rethinking Latent Multi-Hop Reasoning in Large Language Models
Authors: Xukai Liu, Ye Liu, Jipeng Zhang, Yanghai Zhang, Kai Zhang, Qi Liu,
Abstract summary: We show that bridge entities are computed sequentially across layers before later-hop answers.<n>We propose a framework that models multi-hop reasoning as broad recall in shallow layers followed by selective extraction in deeper attention layers.
Score: 26.603700269575025
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) perform well on multi-hop reasoning, yet how they internally compose multiple facts remains unclear. Recent work proposes \emph{hop-aligned circuit hypothesis}, suggesting that bridge entities are computed sequentially across layers before later-hop answers. Through systematic analyses on real-world multi-hop queries, we show that this hop-aligned assumption does not generalize: later-hop answer entities can become decodable earlier than bridge entities, a phenomenon we call \emph{layer-order inversion}, which strengthens with total hops. To explain this behavior, we propose a \emph{probabilistic recall-and-extract} framework that models multi-hop reasoning as broad probabilistic recall in shallow MLP layers followed by selective extraction in deeper attention layers. This framework is empirically validated through systematic probing analyses, reinterpreting prior layer-wise decoding evidence, explaining chain-of-thought gains, and providing a mechanistic diagnosis of multi-hop failures despite correct single-hop knowledge. Code is available at https://github.com/laquabe/Layer-Order-Inversion.

Related papers

Think Parallax: Solving Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation [2.8890464940342873]
Large language models (LLMs) excel at language understanding but often hallucinate and struggle with multi-hop reasoning.<n>We propose ParallaxRAG, a framework that symmetrically decouples queries and graph triples into multi-view spaces.<n>Our results highlight multi-view head specialization as a principled direction for knowledge-grounded multi-hop reasoning.
arXiv Detail & Related papers (2025-10-17T11:34:27Z)
GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval [52.47514434103737]
We introduce GRITHopper-7B, a novel multi-hop dense retrieval model that achieves state-of-the-art performance.<n> GRITHopper combines generative and representational instruction tuning by integrating causal language modeling with dense retrieval training.<n>We find that incorporating additional context after the retrieval process, referred to as post-retrieval language modeling, enhances dense retrieval performance.
arXiv Detail & Related papers (2025-03-10T16:42:48Z)
Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models [51.53835083483751]
We investigate how large language models perform latent multi-hop reasoning in prompts like "Wolfgang Amadeus Mozart's mother's spouse is"<n>We find that failures often stem from the relation attribute extraction stage, where conflicting logits reduce prediction accuracy.<n>We propose back attention, a novel mechanism that enables lower layers to leverage higher-layer hidden states from different positions during attention computation.
arXiv Detail & Related papers (2025-02-15T15:36:42Z)
Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large Language Models Attentive Readers? [6.525065859315515]
We investigate whether Large Language Models (LLMs) are prone to exploiting simplifying cues in multi-hop reasoning benchmarks. Motivated by this finding, we propose a challenging multi-hop reasoning benchmark, by generating seemingly plausible multi-hop reasoning chains. We find that their performance to perform multi-hop reasoning is affected, as indicated by up to 45% relative decrease in F1 score when presented with such seemingly plausible alternatives.
arXiv Detail & Related papers (2024-09-08T19:22:58Z)
Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries [39.438904598467154]
We study how large language models (LLMs) solve complex multi-step problems. understanding how the latent step is computed internally is key to understanding the overall computation. We propose a novel "back-patching" analysis method whereby a hidden representation from a later layer is patched back to an earlier layer.
arXiv Detail & Related papers (2024-06-18T16:44:13Z)
HOP, UNION, GENERATE: Explainable Multi-hop Reasoning without Rationale Supervision [118.0818807474809]
This work proposes a principled, probabilistic approach for training explainable multi-hop QA systems without rationale supervision. Our approach performs multi-hop reasoning by explicitly modeling rationales as sets, enabling the model to capture interactions between documents and sentences within a document.
arXiv Detail & Related papers (2023-05-23T16:53:49Z)
Understanding and Improving Zero-shot Multi-hop Reasoning in Generative Question Answering [85.79940770146557]
We decompose multi-hop questions into multiple corresponding single-hop questions. We find marked inconsistency in QA models' answers on these pairs of ostensibly identical question chains. When trained only on single-hop questions, models generalize poorly to multi-hop questions.
arXiv Detail & Related papers (2022-10-09T11:48:07Z)
Locate Then Ask: Interpretable Stepwise Reasoning for Multi-hop Question Answering [71.49131159045811]
Multi-hop reasoning requires aggregating multiple documents to answer a complex question. Existing methods usually decompose the multi-hop question into simpler single-hop questions. We propose an interpretable stepwise reasoning framework to incorporate both single-hop supporting sentence identification and single-hop question generation.
arXiv Detail & Related papers (2022-08-22T13:24:25Z)
Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation. PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions. Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.