Related papers: MARS: toward more efficient multi-agent collaboration for LLM reasoning

MARS: toward more efficient multi-agent collaboration for LLM reasoning

URL: http://arxiv.org/abs/2509.20502v1
Date: Wed, 24 Sep 2025 19:24:33 GMT
Title: MARS: toward more efficient multi-agent collaboration for LLM reasoning
Authors: Xiao Wang, Jia Wang, Yijie Wang, Pengtao Dang, Sha Cao, Chi Zhang,
Abstract summary: Multi-Agent Review System (MARS) is a role-based collaboration framework inspired by the review process.<n>We show that MARS matches the accuracy of Multi-Agent Debate (MAD) while reducing both token usage and inference time by approximately 50%.
Score: 12.889395413072696
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have achieved impressive results in natural language understanding, yet their reasoning capabilities remain limited when operating as single agents. Multi-Agent Debate (MAD) has been proposed to address this limitation by enabling collaborative reasoning among multiple models in a round-table debate manner. While effective, MAD introduces substantial computational overhead due to the number of agents involved and the frequent communication required. In this paper, we propose MARS (Multi-Agent Review System), a role-based collaboration framework inspired by the review process. In MARS, an author agent generates an initial solution, reviewer agents provide decisions and comments independently, and a meta-reviewer integrates the feedback to make the final decision and guide further revision. This design enhances reasoning quality while avoiding costly reviewer-to-reviewer interactions, thereby controlling token consumption and inference time. We compared MARS with both MAD and other state-of-the-art reasoning strategies across multiple benchmarks. Extensive experiments with different LLMs show that MARS matches the accuracy of MAD while reducing both token usage and inference time by approximately 50\%. Code is available at https://github.com/xwang97/MARS.

Related papers

Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning [49.99694105650486]
Self-Debate Reinforcement Learning (SDRL) is a training framework that equips a single large language model with strong problem-solving ability.<n>We show that SDRL improves overall Multi-Agent Debate (MAD) performance while simultaneously strengthening single model reasoning.
arXiv Detail & Related papers (2026-01-29T20:21:44Z)
M3MAD-Bench: Are Multi-Agent Debates Really Effective Across Domains and Modalities? [37.902089112579]
Multi-Agent Debate (MAD) orchestrates multiple agents through structured debate to improve answer quality and support complex reasoning.<n>Existing research on MAD suffers from two fundamental limitations: evaluations are conducted under fragmented and inconsistent settings, hindering fair comparison.<n>We introduce M3MAD-Bench, a unified benchmark for evaluating MAD methods across Multi-domain tasks, Multi-modal inputs, and Multi-dimensional metrics.
arXiv Detail & Related papers (2026-01-06T09:33:48Z)
iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference [11.86992814928132]
Multi-Agent Debate (MAD) has emerged as a promising framework that engages multiple agents in structured debates.<n>We propose intelligent Multi-Agent Debate (iMAD), a token-efficient framework that selectively triggers MAD only when it is likely to be beneficial.<n>We show that iMAD significantly reduces token usage (by up to 92%) while also improving final answer accuracy (by up to 13.5%)
arXiv Detail & Related papers (2025-11-14T13:50:51Z)
AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress [71.02263260394261]
Large language models (LLMs) still encounter challenges in multi-turn decision-making tasks.<n>We build process reward models (PRMs) to evaluate each decision and guide the agent's decision-making process.<n>AgentPRM captures both the interdependence between sequential decisions and their contribution to the final goal.
arXiv Detail & Related papers (2025-11-11T14:57:54Z)
Free-MAD: Consensus-Free Multi-Agent Debate [17.384699873512464]
Multi-agent debate (MAD) is an emerging approach to improving the reasoning capabilities of large language models (LLMs)<n>Existing MAD methods rely on multiple rounds of interaction among agents to reach consensus, and the final output is selected by majority voting in the last round.<n>We propose textscFree-MAD, a novel MAD framework that eliminates the need for consensus among agents.
arXiv Detail & Related papers (2025-09-14T01:55:01Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
Stop Overvaluing Multi-Agent Debate -- We Must Rethink Evaluation and Embrace Model Heterogeneity [20.408720462383158]
Multi-agent debate (MAD) has gained significant attention as a promising line of research to improve the factual accuracy and reasoning capabilities of large language models (LLMs)<n>Despite its conceptual appeal, current MAD research suffers from critical limitations in evaluation practices.<n>This paper presents a systematic evaluation of 5 representative MAD methods across 9 benchmarks using 4 foundational models.
arXiv Detail & Related papers (2025-02-12T21:01:10Z)
Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents. There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z)
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs [7.7433783185451075]
We benchmark a range of debating and prompting strategies to explore the trade-offs between cost, time, and accuracy. We find that multi-agent debating systems, in their current form, do not reliably outperform other proposed prompting strategies. We build on these results to offer insights into improving debating strategies, such as adjusting agent agreement levels.
arXiv Detail & Related papers (2023-11-29T05:54:41Z)
Sentiment Analysis through LLM Negotiations [58.67939611291001]
A standard paradigm for sentiment analysis is to rely on a singular LLM and makes the decision in a single round. This paper introduces a multi-LLM negotiation framework for sentiment analysis.
arXiv Detail & Related papers (2023-11-03T12:35:29Z)
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.