Research on Multi-hop Inference Optimization of LLM Based on MQUAKE Framework
- URL: http://arxiv.org/abs/2509.04770v1
- Date: Fri, 05 Sep 2025 02:58:45 GMT
- Title: Research on Multi-hop Inference Optimization of LLM Based on MQUAKE Framework
- Authors: Zucheng Liang, Wenxin Wei, Kaijie Zhang, Hongyi Chen,
- Abstract summary: This paper builds upon research within the MQUAKE framework to propose a multi-hop question decomposition method for complex questions.<n>We investigate the impact of multi-hop question decomposition within knowledge graphs on model comprehension and reasoning accuracy, both before and after model training.
- Score: 3.433214967077916
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurately answering complex questions has consistently been a significant challenge for Large Language Models (LLMs). To address this, this paper proposes a multi-hop question decomposition method for complex questions, building upon research within the MQUAKE framework. Utilizing the LLAMA3 model, we systematically investigate the impact of multi-hop question decomposition within knowledge graphs on model comprehension and reasoning accuracy, both before and after model training. In our experiments, we systematically partitioned and converted the MQUAKE-T dataset into two distinct formats: a single-hop dataset designed for directly answering complex questions, and a multi-hop dataset constructed using the multi-hop question decomposition method. We then fine-tuned the LLAMA3 model on these datasets and conducted inference tests. Our results demonstrate that, without fine-tuning the LLM, the prediction performance based on the multi-hop question decomposition method significantly outperforms the method of directly answering complex questions. After fine-tuning using the LoRA (Low-Rank Adaptation) method, the performance of both approaches improved compared to the untrained baseline. Crucially, the method utilizing multi-hop decomposition consistently maintained its superiority. These findings validate the effectiveness of the multi-hop decomposition method both before and after training, demonstrating its capability to effectively enhance the LLM's ability to answer complex questions.
Related papers
- PathFinder: MCTS and LLM Feedback-based Path Selection for Multi-Hop Question Answering [7.982446458726334]
Multi-hop question answering is a challenging task in which language models must reason over multiple steps to reach the correct answer.<n>We propose PATHFINDER, an approach that: (i) uses Monte Carlo Tree Search to generate training path traces, (ii) improves training data quality by filtering erroneous and lengthy traces, and (iii) reformulates sub-queries to handle failed retrieval cases.
arXiv Detail & Related papers (2025-12-05T00:33:31Z) - Decomposition-Enhanced Training for Post-Hoc Attributions In Language Models [64.49342399229529]
We argue that post-hoc attribution can be reframed as a reasoning problem, where answers are decomposed into constituent units, each tied to specific context.<n>We introduce DecompTune, a post-training method that teaches models to produce answer decompositions as intermediate reasoning steps.<n>Across extensive experiments and ablations, DecompTune substantially improves attribution quality, outperforming prior methods and matching or exceeding state-of-the-art frontier models.
arXiv Detail & Related papers (2025-10-29T17:58:59Z) - DAGR: Decomposition Augmented Graph Retrieval with LLMs [1.034893617526558]
DAGR is a retrieval method that leverages both complex questions and their decomposition in subquestions to extract relevant, linked subgraphs.<n>The resulting Graph-RAG pipeline is suited to handle complex multi-hop questions and effectively reason over graph-structured data.<n>We evaluate DAGR on standard multi-hop QA benchmarks and show that it achieves comparable or superior performance to competitive existing methods.
arXiv Detail & Related papers (2025-06-16T11:44:28Z) - Reinforcing Question Answering Agents with Minimalist Policy Gradient Optimization [80.09112808413133]
Mujica is a planner that decomposes questions into acyclic graph of subquestions and a worker that resolves questions via retrieval and reasoning.<n>MyGO is a novel reinforcement learning method that replaces traditional policy updates with gradient Likelihood Maximum Estimation.<n> Empirical results across multiple datasets demonstrate the effectiveness of MujicaMyGO in enhancing multi-hop QA performance.
arXiv Detail & Related papers (2025-05-20T18:33:03Z) - Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning [69.64809103333839]
We investigate how explicitly modeling problem's difficulty prior information shapes the effectiveness of reinforcement learning based fine-tuning for multimodal reasoning.<n>Our approach demonstrates significant performances across various multi-modal mathematical reasoning benchmarks with only 2K+0.6K two-stage training data.
arXiv Detail & Related papers (2025-05-19T15:43:10Z) - GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval [52.47514434103737]
We introduce GRITHopper-7B, a novel multi-hop dense retrieval model that achieves state-of-the-art performance.<n> GRITHopper combines generative and representational instruction tuning by integrating causal language modeling with dense retrieval training.<n>We find that incorporating additional context after the retrieval process, referred to as post-retrieval language modeling, enhances dense retrieval performance.
arXiv Detail & Related papers (2025-03-10T16:42:48Z) - EfficientRAG: Efficient Retriever for Multi-Hop Question Answering [52.64500643247252]
We introduce EfficientRAG, an efficient retriever for multi-hop question answering.
Experimental results demonstrate that EfficientRAG surpasses existing RAG methods on three open-domain multi-hop question-answering datasets.
arXiv Detail & Related papers (2024-08-08T06:57:49Z) - Enhancing Large Language Model Performance To Answer Questions and
Extract Information More Accurately [2.1715455600756646]
Large Language Models (LLMs) generate responses to questions.
Their effectiveness is often hindered by sub-optimal quality of answers and occasional failures to provide accurate responses to questions.
To address these challenges, a fine-tuning process is employed, involving feedback and examples to refine models.
arXiv Detail & Related papers (2024-01-27T00:18:07Z) - Small Language Models Fine-tuned to Coordinate Larger Language Models
improve Complex Reasoning [41.03267013352519]
Large Language Models (LLMs) prompted to generate chain-of-thought exhibit impressive reasoning capabilities.
We introduce DaSLaM, which uses a decomposition generator to decompose complex problems into subproblems that require fewer reasoning steps.
We show that DaSLaM is not limited by the solver's capabilities as a function of scale.
arXiv Detail & Related papers (2023-10-21T15:23:20Z) - Multimodal Multi-Hop Question Answering Through a Conversation Between
Tools and Efficiently Finetuned Large Language Models [20.52053559484399]
We employ a tool-interacting divide-and-conquer strategy to answer complex multi-hop questions.
To increase the reasoning ability of LLMs, we prompt chatGPT to generate a tool-interacting divide-and-conquer dataset.
To assess the effectiveness of this approach, we conduct an evaluation on two recently introduced complex question-answering datasets.
arXiv Detail & Related papers (2023-09-16T08:22:22Z) - Successive Prompting for Decomposing Complex Questions [50.00659445976735]
Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting.
We introduce Successive Prompting'', where we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution.
Our best model (with successive prompting) achieves an improvement of 5% absolute F1 on a few-shot version of the DROP dataset.
arXiv Detail & Related papers (2022-12-08T06:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.