PathFinder: MCTS and LLM Feedback-based Path Selection for Multi-Hop Question Answering
- URL: http://arxiv.org/abs/2512.05336v1
- Date: Fri, 05 Dec 2025 00:33:31 GMT
- Title: PathFinder: MCTS and LLM Feedback-based Path Selection for Multi-Hop Question Answering
- Authors: Durga Prasad Maram, Kalpa Gunaratna, Vijay Srinivasan, Haris Jeelani, Srinivas Chappidi,
- Abstract summary: Multi-hop question answering is a challenging task in which language models must reason over multiple steps to reach the correct answer.<n>We propose PATHFINDER, an approach that: (i) uses Monte Carlo Tree Search to generate training path traces, (ii) improves training data quality by filtering erroneous and lengthy traces, and (iii) reformulates sub-queries to handle failed retrieval cases.
- Score: 7.982446458726334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-hop question answering is a challenging task in which language models must reason over multiple steps to reach the correct answer. With the help of Large Language Models and their reasoning capabilities, existing systems are able to think and decompose an input question over multiple steps to analyze, retrieve, and reason. However, training-based approaches for this problem still suffer from LLM hallucinations and incorrect reasoning paths that hinder performance. Hence, we propose PATHFINDER, an approach that: (i) uses Monte Carlo Tree Search to generate training path traces, (ii) improves training data quality by filtering erroneous and lengthy traces using sub-answer recall and LLM-as-a-judge verification, and (iii) reformulates sub-queries to handle failed retrieval cases. By following these steps, we demonstrate that PATHFINDER improves the performance of multi-hop QA over public benchmark datasets.
Related papers
- Research on Multi-hop Inference Optimization of LLM Based on MQUAKE Framework [3.433214967077916]
This paper builds upon research within the MQUAKE framework to propose a multi-hop question decomposition method for complex questions.<n>We investigate the impact of multi-hop question decomposition within knowledge graphs on model comprehension and reasoning accuracy, both before and after model training.
arXiv Detail & Related papers (2025-09-05T02:58:45Z) - DAGR: Decomposition Augmented Graph Retrieval with LLMs [1.034893617526558]
DAGR is a retrieval method that leverages both complex questions and their decomposition in subquestions to extract relevant, linked subgraphs.<n>The resulting Graph-RAG pipeline is suited to handle complex multi-hop questions and effectively reason over graph-structured data.<n>We evaluate DAGR on standard multi-hop QA benchmarks and show that it achieves comparable or superior performance to competitive existing methods.
arXiv Detail & Related papers (2025-06-16T11:44:28Z) - Reinforcing Question Answering Agents with Minimalist Policy Gradient Optimization [80.09112808413133]
Mujica is a planner that decomposes questions into acyclic graph of subquestions and a worker that resolves questions via retrieval and reasoning.<n>MyGO is a novel reinforcement learning method that replaces traditional policy updates with gradient Likelihood Maximum Estimation.<n> Empirical results across multiple datasets demonstrate the effectiveness of MujicaMyGO in enhancing multi-hop QA performance.
arXiv Detail & Related papers (2025-05-20T18:33:03Z) - ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning [74.65632662894086]
We propose ReSearch, a framework that trains LLMs to Reason with Search via reinforcement learning.<n>Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking.<n>Analysis reveals that ReSearch naturally elicits advanced reasoning capabilities such as reflection and self-correction.
arXiv Detail & Related papers (2025-03-25T09:00:58Z) - GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval [52.47514434103737]
We introduce GRITHopper-7B, a novel multi-hop dense retrieval model that achieves state-of-the-art performance.<n> GRITHopper combines generative and representational instruction tuning by integrating causal language modeling with dense retrieval training.<n>We find that incorporating additional context after the retrieval process, referred to as post-retrieval language modeling, enhances dense retrieval performance.
arXiv Detail & Related papers (2025-03-10T16:42:48Z) - Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation [68.58373854950294]
We focus on causal reasoning and address the task of establishing causal relationships based on correlation information.<n>We introduce a prompting strategy for this problem that breaks the original task into fixed subquestions.<n>We evaluate our approach on an existing causal benchmark, Corr2Cause.
arXiv Detail & Related papers (2024-12-18T15:32:27Z) - An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism [14.479060028732803]
We argue that the current methods of multi-modal multi-hop question answering still mainly face two challenges.<n>The retrieved evidence containing a large amount of redundant information leads to a significant drop in performance.<n>The reasoning process without interpretable reasoning steps makes the model difficult to discover the logical errors for handling complex questions.
arXiv Detail & Related papers (2024-12-08T05:47:55Z) - Zero-Shot Multi-Hop Question Answering via Monte-Carlo Tree Search with Large Language Models [19.214387260667348]
This paper introduces Monte-Carlo tree search for Zero-shot multi-hop Question Answering (MZQA), a framework based on Monte-Carlo tree search (MCTS)
Unlike previous works, we propose a zero-shot prompting method, which relies solely on instructions without the support of hand-crafted few-shot examples that typically require domain expertise.
We also introduce a behavioral cloning approach (MZQA-BC) trained on self-generated MCTS inference trajectories, achieving an over 10-fold increase in reasoning speed with bare compromise in performance.
arXiv Detail & Related papers (2024-09-28T15:13:04Z) - FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering [26.398873686905063]
Large Language Models (LLMs) with chain-of-thought (COT) prompting have demonstrated impressive abilities on simple nature language inference tasks.
We propose a prompting method, Finite State Machine (FSM) to enhance the reasoning capabilities of LLM for complex tasks.
arXiv Detail & Related papers (2024-07-03T10:01:01Z) - Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering [18.94220625114711]
Large language models (LLMs) perform surprisingly well and outperform human experts on many tasks.
This paper integrates and optimized a pipeline for selecting reasoning paths from KG based on LLM.
We also propose a simple and effective subgraph retrieval method based on chain of thought (CoT) and page rank.
arXiv Detail & Related papers (2024-04-16T08:28:16Z) - PathFinder: Guided Search over Multi-Step Reasoning Paths [80.56102301441899]
We propose PathFinder, a tree-search-based reasoning path generation approach.
It enhances diverse branching and multi-hop reasoning through the integration of dynamic decoding.
Our model generalizes well to longer, unseen reasoning chains, reflecting similar complexities to beam search with large branching factors.
arXiv Detail & Related papers (2023-12-08T17:05:47Z) - Few-shot Reranking for Multi-hop QA via Language Model Prompting [56.454088569241534]
We study few-shot reranking for multi-hop QA with open-domain questions.
We propose PromptRank, which relies on large language models prompting for multi-hop path reranking.
PromptRank yields strong retrieval performance on HotpotQA with only 128 training examples.
arXiv Detail & Related papers (2022-05-25T10:45:55Z) - Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation.
PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions.
Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.