Self-Critique Guided Iterative Reasoning for Multi-hop Question Answering
- URL: http://arxiv.org/abs/2505.19112v1
- Date: Sun, 25 May 2025 12:10:24 GMT
- Title: Self-Critique Guided Iterative Reasoning for Multi-hop Question Answering
- Authors: Zheng Chu, Huiming Fan, Jingchang Chen, Qianyu Wang, Mingda Yang, Jiafeng Liang, Zhongjie Wang, Hao Li, Guo Tang, Ming Liu, Bing Qin,
- Abstract summary: Large language models (LLMs) face challenges in knowledge-intensive multi-hop reasoning.<n>We propose Self-Critique Guided Iterative Reasoning (SiGIR)<n>SiGIR uses self-critique feedback to guide the iterative reasoning process.
- Score: 24.446222685949227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although large language models (LLMs) have demonstrated remarkable reasoning capabilities, they still face challenges in knowledge-intensive multi-hop reasoning. Recent work explores iterative retrieval to address complex problems. However, the lack of intermediate guidance often results in inaccurate retrieval and flawed intermediate reasoning, leading to incorrect reasoning. To address these, we propose Self-Critique Guided Iterative Reasoning (SiGIR), which uses self-critique feedback to guide the iterative reasoning process. Specifically, through end-to-end training, we enable the model to iteratively address complex problems via question decomposition. Additionally, the model is able to self-evaluate its intermediate reasoning steps. During iterative reasoning, the model engages in branching exploration and employs self-evaluation to guide the selection of promising reasoning trajectories. Extensive experiments on three multi-hop reasoning datasets demonstrate the effectiveness of our proposed method, surpassing the previous SOTA by $8.6\%$. Furthermore, our thorough analysis offers insights for future research. Our code, data, and models are available at Github: https://github.com/zchuz/SiGIR-MHQA.
Related papers
- Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis [3.711555701154055]
Reasoning models and their integration into practical AI chat bots have led to breakthroughs in solving advanced math, deep search, and extractive question answering problems.<n>Yet, a complete understanding of why these models hallucinate more than general purpose language models is missing.<n>In this study, we systematicallyexplore reasoning failures of contemporary language models on multi-hop question answering tasks.
arXiv Detail & Related papers (2025-08-06T17:58:36Z) - Improving Rationality in the Reasoning Process of Language Models through Self-playing Game [25.193698725021108]
We design a Critic-Discernment Game (CDG) in which a prover first provides a solution to a given problem and is subsequently challenged by critiques of its solution.<n>The objective of the prover is to maintain the correct answer when faced with misleading comments, while correcting errors in response to constructive feedback.<n>Our experiments on tasks involving mathematical reasoning, stepwise error detection, self-correction, and long-chain reasoning demonstrate that CDG training can significantly improve the ability of well-aligned LLMs to comprehend their reasoning process.
arXiv Detail & Related papers (2025-06-28T15:11:23Z) - Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions [100.41062461003389]
We show that framing reasoning as a search process helps the model "connect the dots" between fragmented knowledge and produce extended reasoning traces in non-reasoning models.<n>We evaluate our method across three benchmarks and observe consistent improvements.
arXiv Detail & Related papers (2025-06-10T15:51:16Z) - Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt [74.35891434097053]
Reasoning Large Language Models (RLLMs) have demonstrated impressive performance on complex tasks.<n>They often exhibit overthinking -- performing unnecessary reasoning steps even after arriving at the correct answer.<n>We present a quantitative analysis of overthinking from the perspective of self-doubt.<n>We introduce a simple and effective prompting method to reduce the model's over-reliance on input questions.
arXiv Detail & Related papers (2025-05-29T14:30:02Z) - Interleaved Reasoning for Large Language Models via Reinforcement Learning [22.403928213802036]
Long chain-of-thought (CoT) enhances large language models' (LLM) reasoning capabilities.<n>We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions.
arXiv Detail & Related papers (2025-05-26T07:58:17Z) - Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think [51.0691253204425]
We analyze intermediate reasoning steps, termed subthoughts, to answer two questions: Does the final answer reliably represent the model's optimal conclusion?<n>Our approach involves segmenting a reasoning trace into sequential subthoughts based on linguistic cues.<n>We find that aggregating these answers by selecting the most frequent one (the mode) often yields significantly higher accuracy compared to relying solely on the answer derived from the original complete trace.
arXiv Detail & Related papers (2025-04-29T12:39:07Z) - Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods [39.89239733570008]
This work conducts a comprehensive analysis of inference-time scaling methods for both reasoning and non-reasoning models.<n>We find that non-reasoning models, even with an extremely high inference budget, still fall substantially behind reasoning models.<n>For reasoning models, majority voting proves to be a robust inference strategy, generally competitive or outperforming other more sophisticated ITC methods.
arXiv Detail & Related papers (2025-04-18T19:32:55Z) - Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision [120.40788744292739]
We propose a two-player paradigm that separates the roles of reasoning and critique models.
We first propose AutoMathCritique, an automated and scalable framework for collecting critique data.
We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
arXiv Detail & Related papers (2024-11-25T17:11:54Z) - Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths [69.39559168050923]
We introduce Reasoning Paths Optimization (RPO), which enables learning to reason and explore from diverse paths.
Our approach encourages favorable branches at each reasoning step while penalizing unfavorable ones, enhancing the model's overall problem-solving performance.
We focus on multi-step reasoning tasks, such as math word problems and science-based exam questions.
arXiv Detail & Related papers (2024-10-07T06:37:25Z) - Leveraging Structured Information for Explainable Multi-hop Question
Answering and Reasoning [14.219239732584368]
In this work, we investigate constructing and leveraging extracted semantic structures (graphs) for multi-hop question answering.
Empirical results and human evaluations show that our framework: generates more faithful reasoning chains and substantially improves the QA performance on two benchmark datasets.
arXiv Detail & Related papers (2023-11-07T05:32:39Z) - HOP, UNION, GENERATE: Explainable Multi-hop Reasoning without Rationale
Supervision [118.0818807474809]
This work proposes a principled, probabilistic approach for training explainable multi-hop QA systems without rationale supervision.
Our approach performs multi-hop reasoning by explicitly modeling rationales as sets, enabling the model to capture interactions between documents and sentences within a document.
arXiv Detail & Related papers (2023-05-23T16:53:49Z) - REFINER: Reasoning Feedback on Intermediate Representations [47.36251998678097]
We introduce REFINER, a framework for finetuning language models to generate intermediate inferences.
REFINER works by interacting with a critic model that provides automated feedback on the reasoning.
Empirical evaluations show significant improvements over baseline LMs of comparable scale.
arXiv Detail & Related papers (2023-04-04T15:57:28Z) - Faithful Reasoning Using Large Language Models [12.132449274592668]
We show how LMs can be made to perform faithful multi-step reasoning via a process whose causal structure mirrors the underlying logical structure of the problem.
Our approach works by chaining together reasoning steps, where each step results from calls to two fine-tuned LMs.
We demonstrate the effectiveness of our model on multi-step logical deduction and scientific question-answering, showing that it outperforms baselines on final answer accuracy.
arXiv Detail & Related papers (2022-08-30T13:44:41Z) - Robustifying Multi-hop QA through Pseudo-Evidentiality Training [28.584236042324896]
We study the bias problem of multi-hop question answering models, of answering correctly without correct reasoning.
We propose a new approach to learn evidentiality, deciding whether the answer prediction is supported by correct evidences.
arXiv Detail & Related papers (2021-07-07T14:15:14Z) - Generative Context Pair Selection for Multi-hop Question Answering [60.74354009152721]
We propose a generative context selection model for multi-hop question answering.
Our proposed generative passage selection model has a better performance (4.9% higher than baseline) on adversarial held-out set.
arXiv Detail & Related papers (2021-04-18T07:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.