Related papers: Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?

Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?

URL: http://arxiv.org/abs/2511.11040v1
Date: Fri, 14 Nov 2025 07:47:56 GMT
Title: Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?
Authors: Qian Zhang, Yan Zheng, Jinyi Liu, Hebin Liang, Lanjun Wang,
Abstract summary: We demonstrate that allocating roles with differing viewpoints to specific positions significantly impacts Multi-Agent Debate (MAD) performance.<n>We find a novel role allocation strategy, "Truth Last", which can improve MAD performance by up to 22% in reasoning tasks.<n>To address the issue of unknown truth in practical applications, we propose the Multi-Agent Debate Consistency (MADC) strategy.
Score: 21.065994966720226
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies on LLM agent scaling have highlighted the potential of Multi-Agent Debate (MAD) to enhance reasoning abilities. However, the critical aspect of role allocation strategies remains underexplored. In this study, we demonstrate that allocating roles with differing viewpoints to specific positions significantly impacts MAD's performance in reasoning tasks. Specifically, we find a novel role allocation strategy, "Truth Last", which can improve MAD performance by up to 22% in reasoning tasks. To address the issue of unknown truth in practical applications, we propose the Multi-Agent Debate Consistency (MADC) strategy, which systematically simulates and optimizes its core mechanisms. MADC incorporates path consistency to assess agreement among independent roles, simulating the role with the highest consistency score as the truth. We validated MADC across a range of LLMs (9 models), including the DeepSeek-R1 Distilled Models, on challenging reasoning tasks. MADC consistently demonstrated advanced performance, effectively overcoming MAD's performance bottlenecks and providing a crucial pathway for further improvements in LLM agent scaling.

Related papers

RUMAD: Reinforcement-Unifying Multi-Agent Debate [15.05837715937063]
Multi-agent debate (MAD) systems leverage collective intelligence to enhance reasoning capabilities.<n>Existing approaches struggle to simultaneously optimize accuracy, consensus formation, and computational efficiency.<n>This work presents RUMAD, a novel framework that formulates dynamic communication topology control in MAD as a reinforcement learning (RL) problem.
arXiv Detail & Related papers (2026-02-27T10:04:26Z)
Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models [3.4829662575293585]
Large Language Model (LLM) agents are becoming widely used for various Requirements Engineering (RE) tasks.<n>Research on improving their accuracy mainly focuses on prompt engineering, model fine-tuning, and retrieval augmented generation.<n>We propose that just as human debates enhance accuracy and reduce bias in RE tasks by incorporating diverse perspectives, different LLM agents debating and collaborating may achieve similar improvements.
arXiv Detail & Related papers (2025-07-08T13:37:59Z)
Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness [50.29739337771454]
Multi-agent debate (MAD) approaches offer improved reasoning, robustness, and diverse perspectives over monolithic models.<n>This paper conceptualizes MAD as a test-time computational scaling technique, distinguished by collaborative refinement and diverse exploration capabilities.<n>We conduct a comprehensive empirical investigation comparing MAD with strong self-agent test-time scaling baselines on mathematical reasoning and safety-related tasks.
arXiv Detail & Related papers (2025-05-29T01:02:55Z)
Is Multi-Agent Debate (MAD) the Silver Bullet? An Empirical Analysis of MAD in Code Summarization and Translation [10.038721196640864]
Multi-Agent Debate (MAD) systems enable structured debates among Large Language Models (LLMs)<n> MAD promotes divergent thinking through role-specific agents, dynamic interactions, and structured decision-making.<n>This study investigates MAD's effectiveness on two Software Engineering (SE) tasks.
arXiv Detail & Related papers (2025-03-15T07:30:37Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
Stop Overvaluing Multi-Agent Debate -- We Must Rethink Evaluation and Embrace Model Heterogeneity [20.408720462383158]
Multi-agent debate (MAD) has gained significant attention as a promising line of research to improve the factual accuracy and reasoning capabilities of large language models (LLMs)<n>Despite its conceptual appeal, current MAD research suffers from critical limitations in evaluation practices.<n>This paper presents a systematic evaluation of 5 representative MAD methods across 9 benchmarks using 4 foundational models.
arXiv Detail & Related papers (2025-02-12T21:01:10Z)
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks [77.59035801244278]
This paper investigates the role of the Large Language Model (LLM) backbone in Multimodal Large Language Models (MLLMs) evaluation. Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs. Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50% of error rates can be attributed to insufficient world knowledge in the LLM backbone.
arXiv Detail & Related papers (2024-10-16T07:49:13Z)
Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task. We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z)
Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents. There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z)
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs [7.7433783185451075]
We benchmark a range of debating and prompting strategies to explore the trade-offs between cost, time, and accuracy. We find that multi-agent debating systems, in their current form, do not reliably outperform other proposed prompting strategies. We build on these results to offer insights into improving debating strategies, such as adjusting agent agreement levels.
arXiv Detail & Related papers (2023-11-29T05:54:41Z)
Mimicking Better by Matching the Approximate Action Distribution [48.95048003354255]
We introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations. We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods.
arXiv Detail & Related papers (2023-06-16T12:43:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.