ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA
- URL: http://arxiv.org/abs/2503.06951v2
- Date: Thu, 29 May 2025 17:37:26 GMT
- Title: ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA
- Authors: Xinjie Zhao, Fan Gao, Xingyu Song, Yingjian Chen, Rui Yang, Yanran Fu, Yuyang Wang, Yusuke Iwasawa, Yutaka Matsuo, Irene Li,
- Abstract summary: ReAgent is a reversible multi-Agent collaborative framework augmented with explicit backtracking mechanisms.<n>Our system can detect and correct errors mid-reasoning, leading to more robust and interpretable QA outcomes.
- Score: 29.578079759428014
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advances in large language models (LLMs) have significantly improved multi-hop question answering (QA) through direct Chain-of-Thought (CoT) reasoning. However, the irreversible nature of CoT leads to error accumulation, making it challenging to correct mistakes in multi-hop reasoning. This paper introduces ReAgent: a Reversible multi-Agent collaborative framework augmented with explicit backtracking mechanisms, enabling reversible multi-hop reasoning. By incorporating text-based retrieval, information aggregation and validation, our system can detect and correct errors mid-reasoning, leading to more robust and interpretable QA outcomes. The framework and experiments serve as a foundation for future work on error-tolerant QA systems. Empirical evaluations across three benchmarks indicate ReAgent's efficacy, yielding average about 6\% improvements against baseline models.
Related papers
- Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering [14.456873356080186]
Reasoning Tree Guided RAG (RT-RAG) is a novel hierarchical framework for complex multi-hop QA.<n>RT-RAG systematically decomposes multi-hop questions into explicit reasoning trees, minimizing inaccurate decomposition.
arXiv Detail & Related papers (2026-01-16T13:02:25Z) - Talk Less, Verify More: Improving LLM Assistants with Semantic Checks and Execution Feedback [14.593478824805542]
This paper introduces two complementary verification techniques: Q*, which performs reverse translation and semantic matching between code and user intent, and Feedback+, which incorporates execution feedback to guide code refinement.<n> Evaluations on three benchmark datasets, Spider, Bird, and GSM8K, demonstrate that both Q* and Feedback+ reduce error rates and task completion time.
arXiv Detail & Related papers (2026-01-01T06:10:06Z) - ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering [54.72902502486611]
ReAG is a Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages.<n>ReAG significantly outperforms prior methods, improving answer accuracy and providing interpretable reasoning grounded in retrieved evidence.
arXiv Detail & Related papers (2025-11-27T19:01:02Z) - Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction [58.51530390018909]
Large Language Model based multi-agent systems excel at collaborative problem solving but remain brittle to cascading errors.<n>We present MASC, a metacognitive framework that endows MAS with real-time, unsupervised, step-level error detection and self-correction.
arXiv Detail & Related papers (2025-10-16T05:35:37Z) - AgentAsk: Multi-Agent Systems Need to Ask [26.13279490836716]
Multi-agent systems built on large language models (LLMs) promise enhanced problem-solving capabilities through collaborative division of labor.<n>We propose AgentAsk, a lightweight and plug-and-play clarification module that treats every inter-agent message as a potential failure point and inserts minimally necessary questions to arrest error propagation.<n>AgentAsk consistently improves accuracy and robustness over public multi-agent implementations while keeping overhead minimal, with latency and extra cost all less than 5%.
arXiv Detail & Related papers (2025-10-08T22:36:05Z) - Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z) - ComposeRAG: A Modular and Composable RAG for Corpus-Grounded Multi-Hop Question Answering [42.238086712267396]
ComposeRAG is a novel modular abstraction that decomposes RAG pipelines into atomic, composable modules.<n>It consistently outperforms strong baselines in both accuracy and grounding fidelity.<n>Its verification-first design reduces ungrounded answers by over 10% in low-quality retrieval settings.
arXiv Detail & Related papers (2025-05-30T21:10:30Z) - MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration [63.31211701741323]
We extend multi-agent multi-model reasoning to generation, specifically to improving faithfulness through refinement.
We design intrinsic evaluations for each subtask, with our findings indicating that both multi-agent (multiple instances) and multi-model (diverse LLM types) approaches benefit error detection and critiquing.
We consolidate these insights into a final "recipe" called Multi-Agent Multi-Model Refinement (MAMM-Refine), where multi-agent and multi-model collaboration significantly boosts performance.
arXiv Detail & Related papers (2025-03-19T14:46:53Z) - Can we repurpose multiple-choice question-answering models to rerank retrieved documents? [0.0]
R* is a proof-of-concept model that harmonizes multiple-choice question-answering (MCQA) models for document reranking.
Through experimental validation, R* proves to improve retrieval accuracy and contribute to the field's advancement.
arXiv Detail & Related papers (2025-03-06T17:53:24Z) - Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer [62.01554688056335]
Overestimation in the multiagent setting has received comparatively little attention.<n>We propose a novel hypernet regularizer on hypernetwork weights and biases to constrain the optimization of online global Q-network to prevent overestimation accumulation.
arXiv Detail & Related papers (2025-02-04T05:14:58Z) - Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training [18.896813839389893]
We propose an iterative self-training framework, Agent-R, that enables language Agent to Reflect on the fly.<n>Unlike traditional methods that reward or penalize actions based on correctness, Agent-R leverages MCTS to construct training data that recover correct trajectories from erroneous ones.<n>Our findings demonstrate that Agent-R continuously improves the model's ability to recover from errors and enables timely error correction.
arXiv Detail & Related papers (2025-01-20T11:46:04Z) - MALT: Improving Reasoning with Multi-Agent LLM Training [66.9481561915524]
MALT (Multi-Agent LLM Training) is a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps.<n>On MATH, GSM8K, and CSQA, MALT surpasses the same baseline LLM with a relative improvement of 15.66%, 7.42%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z) - Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [49.362750475706235]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.<n>We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.<n> Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent [9.439315294704368]
Tree of Thoughts (ToT) methods have shown potential in improving reasoning for complex question-answering tasks.
A critical limitation in multi-agent reasoning is the 'Reasoner' agent's shallow exploration of reasoning paths.
We introduce a novel approach combining ToT-based Reasoner agents with a Thought Validator agent.
Our method demonstrates superior performance compared to existing techniques when evaluated on the GSM8K dataset.
arXiv Detail & Related papers (2024-09-17T19:54:37Z) - Derailer-Rerailer: Adaptive Verification for Efficient and Reliable Language Model Reasoning [11.765298236504155]
Derailer-Rerailer is a novel framework that balances reasoning accuracy and computational efficiency.<n>Our framework achieves significant accuracy improvements (8-11% across various reasoning tasks) while maintaining 2-3 times better efficiency than existing verification methods.
arXiv Detail & Related papers (2024-08-25T21:20:17Z) - TRACE the Evidence: Constructing Knowledge-Grounded Reasoning Chains for Retrieval-Augmented Generation [30.485127201645437]
We propose TRACE to enhance the multi-hop reasoning ability of RAG models.
TRACE constructs knowledge-grounded reasoning chains, which are a series of logically connected knowledge triples.
TRACE achieves an average performance improvement of up to 14.03% compared to using all the retrieved documents.
arXiv Detail & Related papers (2024-06-17T12:23:32Z) - SQUARE: Automatic Question Answering Evaluation using Multiple Positive
and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation)
We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z) - Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation.
PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions.
Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z) - Counterfactual Variable Control for Robust and Interpretable Question
Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases.
In this paper, we inspect such spurious "capability" of QA models using causal inference.
We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.