PRISM: Agentic Retrieval with LLMs for Multi-Hop Question Answering
- URL: http://arxiv.org/abs/2510.14278v1
- Date: Thu, 16 Oct 2025 04:02:29 GMT
- Title: PRISM: Agentic Retrieval with LLMs for Multi-Hop Question Answering
- Authors: Md Mahadi Hasan Nahid, Davood Rafiei,
- Abstract summary: We introduce an Agentic Retrieval System that leverages large language models (LLMs) in a structured loop to retrieve relevant evidence with high precision and recall.<n>Our framework consists of three specialized agents: a Question Analyzer that decomposes a multi-hop question into sub-questions, a Selector that identifies the most relevant context for each sub-question, and an Adder that brings in any missing evidence.
- Score: 10.971852280240357
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval plays a central role in multi-hop question answering (QA), where answering complex questions requires gathering multiple pieces of evidence. We introduce an Agentic Retrieval System that leverages large language models (LLMs) in a structured loop to retrieve relevant evidence with high precision and recall. Our framework consists of three specialized agents: a Question Analyzer that decomposes a multi-hop question into sub-questions, a Selector that identifies the most relevant context for each sub-question (focusing on precision), and an Adder that brings in any missing evidence (focusing on recall). The iterative interaction between Selector and Adder yields a compact yet comprehensive set of supporting passages. In particular, it achieves higher retrieval accuracy while filtering out distracting content, enabling downstream QA models to surpass full-context answer accuracy while relying on significantly less irrelevant information. Experiments on four multi-hop QA benchmarks -- HotpotQA, 2WikiMultiHopQA, MuSiQue, and MultiHopRAG -- demonstrates that our approach consistently outperforms strong baselines.
Related papers
- The benefits of query-based KGQA systems for complex and temporal questions in LLM era [55.20230501807337]
Large language models excel in question-answering (QA) yet still struggle with multi-hop reasoning and temporal questions.<n> Query-based knowledge graph QA (KGQA) offers a modular alternative by generating executable queries instead of direct answers.<n>We explore multi-stage query-based framework for WikiData QA, proposing multi-stage approach that enhances performance on challenging multi-hop and temporal benchmarks.
arXiv Detail & Related papers (2025-07-16T06:41:03Z) - Optimizing Question Semantic Space for Dynamic Retrieval-Augmented Multi-hop Question Answering [28.09833765246606]
Q-DREAM consists of three key modules: (1) the Question Decomposition Module (QDM), which decomposes interdependent subquestions; (2) the Subquestion Dependency Module (SDOM), which models the relations of subquestions for better understanding; and (3) the Dynamic Passage Retrieval Module (DPRM), which aligns subquestions with relevant passages by optimizing the semantic embeddings.<n> Experimental results across various benchmarks demonstrate that Q-DREAM significantly outperforms existing RAG methods, achieving state-of-the-art performance in both in-domain and out-of-domain settings.
arXiv Detail & Related papers (2025-05-31T09:57:07Z) - GenDec: A robust generative Question-decomposition method for Multi-hop
reasoning [32.12904215053187]
Multi-hop QA involves step-by-step reasoning to answer complex questions.
Existing large language models'(LLMs) reasoning ability in multi-hop question answering remains exploration.
It is unclear whether LLMs follow a desired reasoning chain to reach the right final answer.
arXiv Detail & Related papers (2024-02-17T02:21:44Z) - Improving Question Generation with Multi-level Content Planning [70.37285816596527]
This paper addresses the problem of generating questions from a given context and an answer, specifically focusing on questions that require multi-hop reasoning across an extended context.
We propose MultiFactor, a novel QG framework based on multi-level content planning. Specifically, MultiFactor includes two components: FA-model, which simultaneously selects key phrases and generates full answers, and Q-model which takes the generated full answer as an additional input to generate questions.
arXiv Detail & Related papers (2023-10-20T13:57:01Z) - End-to-End Beam Retrieval for Multi-Hop Question Answering [37.13580394608824]
Multi-hop question answering involves finding multiple relevant passages and step-by-step reasoning to answer complex questions.
Previous retrievers were customized for two-hop questions, and most of them were trained separately across different hops.
We introduce Beam Retrieval, an end-to-end beam retrieval framework for multi-hop QA.
arXiv Detail & Related papers (2023-08-17T13:24:14Z) - Understanding and Improving Zero-shot Multi-hop Reasoning in Generative
Question Answering [85.79940770146557]
We decompose multi-hop questions into multiple corresponding single-hop questions.
We find marked inconsistency in QA models' answers on these pairs of ostensibly identical question chains.
When trained only on single-hop questions, models generalize poorly to multi-hop questions.
arXiv Detail & Related papers (2022-10-09T11:48:07Z) - Prompt-based Conservation Learning for Multi-hop Question Answering [11.516763652013005]
Multi-hop question answering requires reasoning over multiple documents to answer a complex question.
Most existing multi-hop QA methods fail to answer a large fraction of sub-questions.
We propose the Prompt-based Conservation Learning framework for multi-hop QA.
arXiv Detail & Related papers (2022-09-14T20:50:46Z) - Few-shot Reranking for Multi-hop QA via Language Model Prompting [56.454088569241534]
We study few-shot reranking for multi-hop QA with open-domain questions.
We propose PromptRank, which relies on large language models prompting for multi-hop path reranking.
PromptRank yields strong retrieval performance on HotpotQA with only 128 training examples.
arXiv Detail & Related papers (2022-05-25T10:45:55Z) - Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation.
PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions.
Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z) - Ask to Understand: Question Generation for Multi-hop Question Answering [11.626390908264872]
Multi-hop Question Answering (QA) requires the machine to answer complex questions by finding scattering clues and reasoning from multiple documents.
We propose a novel method to complete multi-hop QA from the perspective of Question Generation (QG)
arXiv Detail & Related papers (2022-03-17T04:02:29Z) - Multi-hop Question Generation with Graph Convolutional Network [58.31752179830959]
Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs.
We propose Multi-Hop volution Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops.
Our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation.
arXiv Detail & Related papers (2020-10-19T06:15:36Z) - Answering Any-hop Open-domain Questions with Iterative Document
Reranking [62.76025579681472]
We propose a unified QA framework to answer any-hop open-domain questions.
Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets.
arXiv Detail & Related papers (2020-09-16T04:31:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.