Related papers: Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning

Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning

URL: http://arxiv.org/abs/2504.05047v2
Date: Tue, 20 May 2025 12:17:15 GMT
Title: Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning
Authors: Sugyeong Eo, Hyeonseok Moon, Evelyn Hayoon Zi, Chanjun Park, Heuiseok Lim,
Abstract summary: Multiagent collaboration has emerged as a promising framework for enhancing the reasoning capabilities of large language models (LLMs)<n>We propose Debate Only When Necessary (DOWN), an adaptive multiagent debate framework that selectively activates debate based on the confidence score of the agent's initial response.<n>Down improves efficiency by up to six times while preserving or even outperforming the performance of existing methods.
Score: 8.800516398660069
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multiagent collaboration has emerged as a promising framework for enhancing the reasoning capabilities of large language models (LLMs). Despite improvements in reasoning, the approach introduces substantial computational overhead resulting from iterative agent interactions. Furthermore, engaging in unnecessary debates increases the risk of generating erroneous responses. To address these challenges, we propose Debate Only When Necessary (DOWN), an adaptive multiagent debate framework that selectively activates debate based on the confidence score of the agent's initial response. Debate is activated only for queries requiring further deliberation, during which agents refine their outputs by referencing peer responses and associated confidence scores. Evaluations on benchmarks show that DOWN improves efficiency by up to six times while preserving or even outperforming the performance of existing methods. Further analysis indicates that DOWN effectively mitigates the risk of error propagation stemming from the unnecessary debate process. These findings demonstrate the effectiveness of our approach in delivering high-performance LLM solutions at a lower computational cost.

Related papers

SAND: Boosting LLM Agents with Self-Taught Action Deliberation [53.732649189709285]
Large Language Model (LLM) agents are commonly tuned with supervised finetuning on ReAct-style expert trajectories or preference optimization over pairwise rollouts.<n>We propose Self-taught ActioN Deliberation (SAND) framework, enabling LLM agents to explicitly deliberate over candidate actions before committing to one.<n>SAND achieves an average 20% improvement over initial supervised finetuning and also outperforms state-of-the-art agent tuning approaches.
arXiv Detail & Related papers (2025-07-10T05:38:15Z)
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? [62.579951798437115]
This work investigates iterative approximate evaluation for arbitrary prompts.<n>It introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework.<n>MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced rollouts.
arXiv Detail & Related papers (2025-07-07T03:20:52Z)
CortexDebate: Debating Sparsely and Equally for Multi-Agent Debate [11.155092859033784]
Multi-Agent Debate (MAD) has emerged as an effective strategy to mitigate issues with single Large Language Model (LLM)<n>Existing MAD methods face two major issues: (a) too lengthy input contexts, which causes LLM agents to get lost in plenty of input information and experiences performance drop; and (b) the overconfidence dilemma, where self-assured LLM agents dominate the debate, leading to low debating effectiveness.<n>We propose a novel MAD method called "CortexDebate", inspired by the human brain's tendency to establish a sparse and dynamically optimized network among cortical areas governed by white matter.
arXiv Detail & Related papers (2025-07-05T07:23:15Z)
Is Multi-Agent Debate (MAD) the Silver Bullet? An Empirical Analysis of MAD in Code Summarization and Translation [10.038721196640864]
Multi-Agent Debate (MAD) systems enable structured debates among Large Language Models (LLMs)<n> MAD promotes divergent thinking through role-specific agents, dynamic interactions, and structured decision-making.<n>This study investigates MAD's effectiveness on two Software Engineering (SE) tasks.
arXiv Detail & Related papers (2025-03-15T07:30:37Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [54.787341008881036]
We introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Experimental results demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents [26.645038049346255]
We propose the Reactive and Reflection agents with Multi-Path Reasoning (RR-MP) Framework.<n>Our approach improves scientific reasoning accuracy by employing a multi-path reasoning mechanism.<n>We conducted zero-shot and few-shot evaluations on tasks involving moral scenarios, college-level physics, and mathematics.
arXiv Detail & Related papers (2024-12-31T13:11:20Z)
Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation [49.27250832754313]
We present AgentCOT, a llm-based autonomous agent framework. At each step, AgentCOT selects an action and executes it to yield an intermediate result with supporting evidence. We introduce two new strategies to enhance the performance of AgentCOT.
arXiv Detail & Related papers (2024-09-19T02:20:06Z)
Improving Multi-Agent Debate with Sparse Communication Topology [9.041025703879905]
Multi-agent debate has proven effective in improving large language models quality for reasoning and factuality tasks. In this paper, we investigate the effect of communication connectivity in multi-agent systems. Our experiments on GPT and Mistral models reveal that multi-agent debates leveraging sparse communication topology can achieve comparable or superior performance.
arXiv Detail & Related papers (2024-06-17T17:33:09Z)
Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent. We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents. Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z)
On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models [16.701242561345786]
The performance of Large Language Models (LLMs) is driven by the similarity between input example tasks and queries. Our investigation shows that the perceived reasoning abilities of LLMs stem from the exemplar-query similarity and approximate retrieval.
arXiv Detail & Related papers (2024-05-22T20:05:49Z)
Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents. There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z)
Learning to Break: Knowledge-Enhanced Reasoning in Multi-Agent Debate System [16.830182915504555]
Multi-agent debate system (MAD) imitates the process of human discussion in pursuit of truth. It is challenging to make various agents perform right and highly consistent cognition due to their limited and different knowledge backgrounds. We propose a novel underlineMulti-underlineAgent underlineDebate with underlineKnowledge-underlineEnhanced framework to promote the system to find the solution.
arXiv Detail & Related papers (2023-12-08T06:22:12Z)
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs [7.7433783185451075]
We benchmark a range of debating and prompting strategies to explore the trade-offs between cost, time, and accuracy. We find that multi-agent debating systems, in their current form, do not reliably outperform other proposed prompting strategies. We build on these results to offer insights into improving debating strategies, such as adjusting agent agreement levels.
arXiv Detail & Related papers (2023-11-29T05:54:41Z)
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z)
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.