CortexDebate: Debating Sparsely and Equally for Multi-Agent Debate
- URL: http://arxiv.org/abs/2507.03928v1
- Date: Sat, 05 Jul 2025 07:23:15 GMT
- Title: CortexDebate: Debating Sparsely and Equally for Multi-Agent Debate
- Authors: Yiliu Sun, Zicheng Zhao, Sheng Wan, Chen Gong,
- Abstract summary: Multi-Agent Debate (MAD) has emerged as an effective strategy to mitigate issues with single Large Language Model (LLM)<n>Existing MAD methods face two major issues: (a) too lengthy input contexts, which causes LLM agents to get lost in plenty of input information and experiences performance drop; and (b) the overconfidence dilemma, where self-assured LLM agents dominate the debate, leading to low debating effectiveness.<n>We propose a novel MAD method called "CortexDebate", inspired by the human brain's tendency to establish a sparse and dynamically optimized network among cortical areas governed by white matter.
- Score: 11.155092859033784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, single Large Language Model (LLM) struggles with critical issues such as hallucination and inadequate reasoning abilities. To mitigate these issues, Multi-Agent Debate (MAD) has emerged as an effective strategy, where LLM agents engage in in-depth debates with others on tasks. However, existing MAD methods face two major issues: (a) too lengthy input contexts, which causes LLM agents to get lost in plenty of input information and experiences performance drop; and (b) the overconfidence dilemma, where self-assured LLM agents dominate the debate, leading to low debating effectiveness. To address these limitations, we propose a novel MAD method called "CortexDebate". Inspired by the human brain's tendency to establish a sparse and dynamically optimized network among cortical areas governed by white matter, CortexDebate constructs a sparse debating graph among LLM agents, where each LLM agent only debates with the ones that are helpful to it. To optimize the graph, we propose a module named McKinsey-based Debate Matter (MDM), which acts as an artificial analog to white matter. By integrating the McKinsey Trust Formula, a well-established measure of trustworthiness from sociology, MDM enables credible evaluations that guide graph optimization. The effectiveness of our CortexDebate has been well demonstrated by extensive experimental results across eight datasets from four task types.
Related papers
- The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News [39.260570381498866]
TruEDebate (TED) is a novel multi-agent system for detecting fake news on social networks.<n>TED employs a rigorous debate process inspired by formal debate settings.<n>The DebateFlow Agents organize agents into two teams, where one supports and the other challenges the truth of the news.
arXiv Detail & Related papers (2025-05-13T13:03:20Z) - Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning [8.800516398660069]
Multiagent collaboration has emerged as a promising framework for enhancing the reasoning capabilities of large language models (LLMs)<n>We propose Debate Only When Necessary (DOWN), an adaptive multiagent debate framework that selectively activates debate based on the confidence score of the agent's initial response.<n>Down improves efficiency by up to six times while preserving or even outperforming the performance of existing methods.
arXiv Detail & Related papers (2025-04-07T13:17:52Z) - CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation [53.452699232071495]
CrossWordBench is a benchmark designed to evaluate the reasoning capabilities of Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) through the medium of crossword puzzles.<n>Our evaluation reveals that reasoning LLMs outperform non-reasoning models substantially by effectively leveraging crossing-letter constraints.<n>Our findings offer insights into the limitations of the reasoning capabilities of current LLMs and LVLMs, and provide an effective approach for creating multimodal constrained tasks for future evaluations.
arXiv Detail & Related papers (2025-03-30T20:03:36Z) - Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark [73.27104042215207]
We introduce EMMA, a benchmark targeting organic multimodal reasoning across mathematics, physics, chemistry, and coding.<n>EMMA tasks demand advanced cross-modal reasoning that cannot be addressed by reasoning independently in each modality.<n>Our evaluation of state-of-the-art MLLMs on EMMA reveals significant limitations in handling complex multimodal and multi-step reasoning tasks.
arXiv Detail & Related papers (2025-01-09T18:55:52Z) - Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate [22.813887723656023]
Agent for Debate (Agent4Debate) is a dynamic multi-agent framework based on Large Language Models (LLMs)
The evaluation employs the Debatrix automatic scoring system and professional human reviewers based on the established Debatrix-Elo and Human-Elo ranking.
Experimental results indicate that the state-of-the-art Agent4Debate exhibits capabilities comparable to those of humans.
arXiv Detail & Related papers (2024-08-08T14:02:45Z) - Cutting Through the Noise: Boosting LLM Performance on Math Word Problems [52.99006895757801]
Large Language Models excel at solving math word problems, but struggle with real-world problems containing irrelevant information.<n>We propose a prompting framework that generates adversarial variants of MWPs by adding irrelevant variables.<n> Fine-tuning on adversarial training instances improves performance on adversarial MWPs by 8%.
arXiv Detail & Related papers (2024-05-30T18:07:13Z) - Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM [51.43102092480804]
Debatrix is an automated debate judge based on Large Language Models (LLMs)
To align with real-world debate scenarios, we introduced the PanelBench benchmark, comparing our system's performance to actual debate outcomes.
The findings indicate a notable enhancement over directly using LLMs for debate evaluation.
arXiv Detail & Related papers (2024-03-12T18:19:47Z) - Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the
Key? [84.36332588191623]
We propose a novel group discussion framework to enrich the set of discussion mechanisms.
We observe that the multi-agent discussion performs better than a single agent only when there is no demonstration in the prompt.
arXiv Detail & Related papers (2024-02-28T12:04:05Z) - Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs [7.7433783185451075]
We benchmark a range of debating and prompting strategies to explore the trade-offs between cost, time, and accuracy.
We find that multi-agent debating systems, in their current form, do not reliably outperform other proposed prompting strategies.
We build on these results to offer insights into improving debating strategies, such as adjusting agent agreement levels.
arXiv Detail & Related papers (2023-11-29T05:54:41Z) - Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution.
Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.