Related papers: Towards Scalable Oversight with Collaborative Multi-Agent Debate in Error Detection

Towards Scalable Oversight with Collaborative Multi-Agent Debate in Error Detection

URL: http://arxiv.org/abs/2510.20963v1
Date: Thu, 23 Oct 2025 19:46:00 GMT
Title: Towards Scalable Oversight with Collaborative Multi-Agent Debate in Error Detection
Authors: Yongqiang Chen, Gang Niu, James Cheng, Bo Han, Masashi Sugiyama,
Abstract summary: Self-diagnosis is unreliable on complex tasks unless aided by reliable external feedback.<n>We introduce a new collaborative MAD protocol, termed ColMAD, that reframes MAD as a non-zero sum game.<n>We show that ColMAD significantly outperforms previous competitive MAD by 19%.
Score: 81.52796950244705
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Accurate detection of errors in large language models (LLM) responses is central to the success of scalable oversight, or providing effective supervision to superhuman intelligence. Yet, self-diagnosis is often unreliable on complex tasks unless aided by reliable external feedback. Multi-agent debate (MAD) seems to be a natural alternative to external feedback: multiple LLMs provide complementary perspectives and cross-checks for error detection. However, prior MAD protocols frame debate as a zero-sum game, where the debaters compete to win the game instead of seeking the truth. Consequently, it leads to debate hacking: debaters tend to mislead the judge by misinterpreting the task or presenting overconfident claims, which introduce more mistakes and underperform single-agent methods. To mitigate the issue, we introduce a new collaborative MAD protocol, termed ColMAD, that reframes MAD as a non-zero sum game. Specifically, ColMAD encourages multiple agents to criticize each other in a supportive way, such that they can complement the missing points of each other. Therefore, the judge agent can make a more informative conclusion based on more comprehensive evidence. Empirically, we show that ColMAD significantly outperforms previous competitive MAD by 19% and brings non-trivial improvements over single-agent methods in error detection.

Related papers

iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference [11.86992814928132]
Multi-Agent Debate (MAD) has emerged as a promising framework that engages multiple agents in structured debates.<n>We propose intelligent Multi-Agent Debate (iMAD), a token-efficient framework that selectively triggers MAD only when it is likely to be beneficial.<n>We show that iMAD significantly reduces token usage (by up to 92%) while also improving final answer accuracy (by up to 13.5%)
arXiv Detail & Related papers (2025-11-14T13:50:51Z)
Multi-agent Undercover Gaming: Hallucination Removal via Counterfactual Test for Multimodal Reasoning [12.06050648342985]
Hallucination poses a major obstacle in the reasoning capabilities of large language models.<n>We introduce the Multi-agent Undercover Gaming (MUG) protocol, inspired by social deduction games like "Who is Undercover?"<n>MUG reframes MAD as a process of detecting "undercover" agents (those suffering from hallucinations) by employing multimodal counterfactual tests.
arXiv Detail & Related papers (2025-11-14T11:27:55Z)
Enhancing Multi-Agent Debate System Performance via Confidence Expression [55.34012400580016]
Multi-Agent Debate (MAD) systems simulate human debate and thereby improve task performance.<n>Some Large Language Models (LLMs) possess superior knowledge or reasoning capabilities for specific tasks, but struggle to clearly communicate this advantage during debates.<n>Inappropriate confidence expression can cause agents in MAD systems to either stubbornly maintain incorrect beliefs or converge prematurely on suboptimal answers.<n>We develop ConfMAD, a MAD framework that integrates confidence expression throughout the debate process.
arXiv Detail & Related papers (2025-09-17T14:34:27Z)
Free-MAD: Consensus-Free Multi-Agent Debate [17.384699873512464]
Multi-agent debate (MAD) is an emerging approach to improving the reasoning capabilities of large language models (LLMs)<n>Existing MAD methods rely on multiple rounds of interaction among agents to reach consensus, and the final output is selected by majority voting in the last round.<n>We propose textscFree-MAD, a novel MAD framework that eliminates the need for consensus among agents.
arXiv Detail & Related papers (2025-09-14T01:55:01Z)
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models? [13.569822165805851]
Multi-Agent Debate(MAD) has emerged as a promising paradigm for improving the performance of large language models.<n>Despite recent advances, the key factors driving MAD's effectiveness remain unclear.<n>We disentangle MAD into two key components--Majority Voting and inter-agent Debate--and assess their respective contributions.
arXiv Detail & Related papers (2025-08-24T22:14:32Z)
Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness [50.29739337771454]
Multi-agent debate (MAD) approaches offer improved reasoning, robustness, and diverse perspectives over monolithic models.<n>This paper conceptualizes MAD as a test-time computational scaling technique, distinguished by collaborative refinement and diverse exploration capabilities.<n>We conduct a comprehensive empirical investigation comparing MAD with strong self-agent test-time scaling baselines on mathematical reasoning and safety-related tasks.
arXiv Detail & Related papers (2025-05-29T01:02:55Z)
Stop Overvaluing Multi-Agent Debate -- We Must Rethink Evaluation and Embrace Model Heterogeneity [20.408720462383158]
Multi-agent debate (MAD) has gained significant attention as a promising line of research to improve the factual accuracy and reasoning capabilities of large language models (LLMs)<n>Despite its conceptual appeal, current MAD research suffers from critical limitations in evaluation practices.<n>This paper presents a systematic evaluation of 5 representative MAD methods across 9 benchmarks using 4 foundational models.
arXiv Detail & Related papers (2025-02-12T21:01:10Z)
MAD-Sherlock: Multi-Agent Debate for Visual Misinformation Detection [36.12673167913763]
We introduce MAD-Sherlock, a multi-agent debate system for out-of-context misinformation detection.<n> MAD-Sherlock frames detection as a multi-agent debate, reflecting the diverse and conflicting discourse found online.<n>Our framework is domain- and time-agnostic, requiring no finetuning, yet achieves state-of-the-art accuracy with in-depth explanations.
arXiv Detail & Related papers (2024-10-26T10:34:22Z)
DebUnc: Improving Large Language Model Agent Communication With Uncertainty Metrics [52.242449026151846]
Multi-agent debates have been introduced to improve the accuracy of Large Language Models (LLMs)<n>We propose DebUnc, a debate framework that uses uncertainty metrics to assess agent confidence.
arXiv Detail & Related papers (2024-07-08T22:15:01Z)
Multi-Agent Imitation Learning: Value is Easy, Regret is Hard [52.31989962031179]
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents. Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations. While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee to deviations by strategic agents.
arXiv Detail & Related papers (2024-06-06T16:18:20Z)
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.