ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA
- URL: http://arxiv.org/abs/2503.06951v1
- Date: Mon, 10 Mar 2025 05:56:46 GMT
- Title: ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA
- Authors: Zhao Xinjie, Fan Gao, Rui Yang, Yingjian Chen, Yuyang Wang, Ying Zhu, Jiacheng Tang, Irene Li,
- Abstract summary: ReAgent is a reversible multi-Agent collaborative framework augmented with explicit backtracking mechanisms.<n>Our system can detect and correct errors mid-reasoning, leading to more robust and interpretable QA outcomes.
- Score: 13.386562087058596
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advances in large language models (LLMs) have significantly improved multi-hop question answering (QA) through direct Chain-of-Thought (CoT) reasoning. However, the irreversible nature of CoT leads to error accumulation, making it challenging to correct mistakes in multi-hop reasoning. This paper introduces ReAgent: a Reversible multi-Agent collaborative framework augmented with explicit backtracking mechanisms, enabling reversible multi-hop reasoning. By incorporating text-based retrieval, information aggregation and validation, our system can detect and correct errors mid-reasoning, leading to more robust and interpretable QA outcomes. The framework and experiments serve as a foundation for future work on error-tolerant QA systems. Empirical evaluations across three benchmarks indicate ReAgent's efficacy, yielding average about 6\% improvements against baseline models.
Related papers
- MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration [63.31211701741323]
We extend multi-agent multi-model reasoning to generation, specifically to improving faithfulness through refinement.
We design intrinsic evaluations for each subtask, with our findings indicating that both multi-agent (multiple instances) and multi-model (diverse LLM types) approaches benefit error detection and critiquing.
We consolidate these insights into a final "recipe" called Multi-Agent Multi-Model Refinement (MAMM-Refine), where multi-agent and multi-model collaboration significantly boosts performance.
arXiv Detail & Related papers (2025-03-19T14:46:53Z) - Can we repurpose multiple-choice question-answering models to rerank retrieved documents? [0.0]
R* is a proof-of-concept model that harmonizes multiple-choice question-answering (MCQA) models for document reranking.
Through experimental validation, R* proves to improve retrieval accuracy and contribute to the field's advancement.
arXiv Detail & Related papers (2025-03-06T17:53:24Z) - Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer [62.01554688056335]
Overestimation in the multiagent setting has received comparatively little attention.<n>We propose a novel hypernet regularizer on hypernetwork weights and biases to constrain the optimization of online global Q-network to prevent overestimation accumulation.
arXiv Detail & Related papers (2025-02-04T05:14:58Z) - Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training [18.896813839389893]
We propose an iterative self-training framework, Agent-R, that enables language Agent to Reflect on the fly.<n>Unlike traditional methods that reward or penalize actions based on correctness, Agent-R leverages MCTS to construct training data that recover correct trajectories from erroneous ones.<n>Our findings demonstrate that Agent-R continuously improves the model's ability to recover from errors and enables timely error correction.
arXiv Detail & Related papers (2025-01-20T11:46:04Z) - MALT: Improving Reasoning with Multi-Agent LLM Training [66.9481561915524]
MALT (Multi-Agent LLM Training) is a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps.<n>On MATH, GSM8K, and CSQA, MALT surpasses the same baseline LLM with a relative improvement of 15.66%, 7.42%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z) - Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [49.362750475706235]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.<n>We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.<n> Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent [9.439315294704368]
Tree of Thoughts (ToT) methods have shown potential in improving reasoning for complex question-answering tasks.
A critical limitation in multi-agent reasoning is the 'Reasoner' agent's shallow exploration of reasoning paths.
We introduce a novel approach combining ToT-based Reasoner agents with a Thought Validator agent.
Our method demonstrates superior performance compared to existing techniques when evaluated on the GSM8K dataset.
arXiv Detail & Related papers (2024-09-17T19:54:37Z) - Derailer-Rerailer: Adaptive Verification for Efficient and Reliable Language Model Reasoning [11.765298236504155]
Derailer-Rerailer is a novel framework that balances reasoning accuracy and computational efficiency.<n>Our framework achieves significant accuracy improvements (8-11% across various reasoning tasks) while maintaining 2-3 times better efficiency than existing verification methods.
arXiv Detail & Related papers (2024-08-25T21:20:17Z) - TRACE the Evidence: Constructing Knowledge-Grounded Reasoning Chains for Retrieval-Augmented Generation [30.485127201645437]
We propose TRACE to enhance the multi-hop reasoning ability of RAG models.
TRACE constructs knowledge-grounded reasoning chains, which are a series of logically connected knowledge triples.
TRACE achieves an average performance improvement of up to 14.03% compared to using all the retrieved documents.
arXiv Detail & Related papers (2024-06-17T12:23:32Z) - SQUARE: Automatic Question Answering Evaluation using Multiple Positive
and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation)
We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z) - Modeling Multi-hop Question Answering as Single Sequence Prediction [88.72621430714985]
We propose a simple generative approach (PathFid) that extends the task beyond just answer generation.
PathFid explicitly models the reasoning process to resolve the answer for multi-hop questions.
Our experiments demonstrate that PathFid leads to strong performance gains on two multi-hop QA datasets.
arXiv Detail & Related papers (2022-05-18T21:57:59Z) - Counterfactual Variable Control for Robust and Interpretable Question
Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases.
In this paper, we inspect such spurious "capability" of QA models using causal inference.
We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.