SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
- URL: http://arxiv.org/abs/2507.23348v1
- Date: Thu, 31 Jul 2025 08:54:46 GMT
- Title: SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
- Authors: Han Li, Yuling Shi, Shaoxin Lin, Xiaodong Gu, Heng Lian, Xin Wang, Yantao Jia, Tao Huang, Qianxiang Wang,
- Abstract summary: SWE-Debate is a competitive multi-agent debate framework that encourages diverse reasoning paths and achieves more consolidated issue localization.<n>It organizes a three-round debate among specialized agents, each embodying distinct reasoning perspectives along the fault propagation trace.<n> Experiments on the SWE-bench benchmark show that SWE-Debate achieves new state-of-the-art results in open-source agent frameworks.
- Score: 29.035097855780858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Issue resolution has made remarkable progress thanks to the advanced reasoning capabilities of large language models (LLMs). Recently, agent-based frameworks such as SWE-agent have further advanced this progress by enabling autonomous, tool-using agents to tackle complex software engineering tasks. While existing agent-based issue resolution approaches are primarily based on agents' independent explorations, they often get stuck in local solutions and fail to identify issue patterns that span across different parts of the codebase. To address this limitation, we propose SWE-Debate, a competitive multi-agent debate framework that encourages diverse reasoning paths and achieves more consolidated issue localization. SWE-Debate first creates multiple fault propagation traces as localization proposals by traversing a code dependency graph. Then, it organizes a three-round debate among specialized agents, each embodying distinct reasoning perspectives along the fault propagation trace. This structured competition enables agents to collaboratively converge on a consolidated fix plan. Finally, this consolidated fix plan is integrated into an MCTS-based code modification agent for patch generation. Experiments on the SWE-bench benchmark show that SWE-Debate achieves new state-of-the-art results in open-source agent frameworks and outperforms baselines by a large margin.
Related papers
- SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution [9.339988760379915]
Large language models (LLMs) exhibit strong performance on self-contained tasks.<n>SWE-Adept is a framework where a localization agent identifies issue-relevant code locations and a resolution agent implements the corresponding fixes.<n> Experiments on SWE-Bench Lite and SWE-Bench Pro demonstrate that SWE-Adept consistently outperforms prior approaches in both issue localization and resolution.
arXiv Detail & Related papers (2026-03-01T23:52:30Z) - DynaDebate: Breaking Homogeneity in Multi-Agent Debate with Dynamic Path Generation [47.62978918069135]
We introduce Dynamic Multi-Agent Debate (DynaDebate), which enhances the effectiveness of multi-agent debate through three key mechanisms.<n>Extensive experiments demonstrate that DynaDebate achieves superior performance across various benchmarks, surpassing existing state-of-the-art MAD methods.
arXiv Detail & Related papers (2026-01-09T12:01:33Z) - BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization [41.08366028094234]
Large language models (LLMs) struggle to generalize to real-world software engineering problems.<n>Existing systems often rely on a single agent to handle the entire workflow-interpreting issues.<n>Motivated by how human engineers decompose complex problems, we propose structuring SWE agents as orchestrators coordinating specialized sub-agents.
arXiv Detail & Related papers (2025-12-29T17:41:11Z) - Benefits and Limitations of Communication in Multi-Agent Reasoning [11.788489289062312]
We propose a theoretical framework to analyze the expressivity of multi-agent systems.<n>We derive bounds on (i) the number of agents required to solve the task exactly, (ii) the quantity and structure of inter-agent communication, and (iii) the achievable speedups as problem size and context scale.<n>Our results identify regimes where communication is provably beneficial, delineate tradeoffs between agent count and bandwidth, and expose intrinsic limitations when either resource is constrained.
arXiv Detail & Related papers (2025-10-14T20:04:27Z) - AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative Multi-Agent Question Answering [51.07491603393163]
tAgent is a framework that formulates multi-agent QA as a knowledge-graph-guided routing problem supervised by empirical performance signals.<n>By leveraging soft supervision and weighted aggregation of agent outputs, Agent learns principled collaboration schemes that capture the complementary strengths of diverse agents.
arXiv Detail & Related papers (2025-10-06T23:20:49Z) - MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning [43.66966457772646]
MA-RAG orchestrates a collaborative set of specialized AI agents to tackle each stage of the RAG pipeline with task-aware reasoning.<n>Our design allows fine-grained control over information flow without any model fine-tuning.<n>This modular and reasoning-driven architecture enables MA-RAG to deliver robust, interpretable results.
arXiv Detail & Related papers (2025-05-26T15:05:18Z) - QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning? [4.429189958406034]
Credit assignment has remained a fundamental challenge in multi-agent reinforcement learning (MARL)<n>We propose a novel algorithm, textbfQLLM, which facilitates the automatic construction of credit assignment functions using large language models (LLMs)<n>Extensive experiments conducted on several standard MARL benchmarks demonstrate that the proposed method consistently outperforms existing state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-17T14:07:11Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Why Do Multi-Agent LLM Systems Fail? [91.39266556855513]
We present MAST (Multi-Agent System Failure taxonomy), the first empirically grounded taxonomy designed to understand MAS failures.<n>We analyze seven popular MAS frameworks across over 200 tasks, involving six expert human annotators.<n>We identify 14 unique failure modes, organized into 3 overarching categories, (i) specification issues, (ii) inter-agent misalignment, and (iii) task verification.
arXiv Detail & Related papers (2025-03-17T19:04:38Z) - Privacy-Enhancing Paradigms within Federated Multi-Agent Systems [47.76990892943637]
LLM-based Multi-Agent Systems (MAS) have proven highly effective in solving complex problems by integrating multiple agents, each performing different roles.<n>In this paper, we introduce the concept of Federated MAS, highlighting the fundamental differences between Federated MAS and traditional FL.<n>We then identify key challenges in developing Federated MAS, including: 1) heterogeneous privacy protocols among agents, 2) structural differences in multi-party conversations, and 3) dynamic conversational network structures.<n>To address these challenges, we propose Embedded Privacy-Enhancing Agents (EPEAgent), an innovative solution that integrates seamlessly into the Retrieval-Augmented Generation phase and the
arXiv Detail & Related papers (2025-03-11T08:38:45Z) - Multi-LLM Collaborative Search for Complex Problem Solving [54.194370845153784]
We propose the Mixture-of-Search-Agents (MoSA) paradigm to enhance search-based reasoning.<n>MoSA integrates diverse reasoning pathways by combining independent exploration with iterative refinement among LLMs.<n>Using Monte Carlo Tree Search (MCTS) as a backbone, MoSA enables multiple agents to propose and aggregate reasoning steps, resulting in improved accuracy.
arXiv Detail & Related papers (2025-02-26T06:31:04Z) - PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving [89.60370366013142]
We propose PlanGEN, a model-agnostic and easily scalable agent framework with three key components: constraint, verification, and selection agents.<n>Specifically, our approach proposes constraint-guided iterative verification to enhance performance of inference-time algorithms.
arXiv Detail & Related papers (2025-02-22T06:21:56Z) - When Disagreements Elicit Robustness: Investigating Self-Repair Capabilities under LLM Multi-Agent Disagreements [56.29265568399648]
We argue that disagreements prevent premature consensus and expand the explored solution space.<n>Disagreements on task-critical steps can derail collaboration depending on the topology of solution paths.
arXiv Detail & Related papers (2025-02-21T02:24:43Z) - Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation [49.27250832754313]
We present AgentCOT, a llm-based autonomous agent framework.
At each step, AgentCOT selects an action and executes it to yield an intermediate result with supporting evidence.
We introduce two new strategies to enhance the performance of AgentCOT.
arXiv Detail & Related papers (2024-09-19T02:20:06Z) - Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering [0.0]
In the chemical and process industries, Process Flow Diagrams (PFDs) and Piping and Instrumentation Diagrams (P&IDs) are critical for design, construction, and maintenance.
Recent advancements in Generative AI have shown promise in understanding and interpreting process diagrams for Visual Question Answering (VQA)
We propose a secure, on-premises enterprise solution using a hierarchical, multi-agent Retrieval Augmented Generation (RAG) framework.
arXiv Detail & Related papers (2024-08-24T19:34:04Z) - Towards Rationality in Language and Multimodal Agents: A Survey [23.451887560567602]
This work discusses how to build more rational language and multimodal agents.<n> Rationality is quality of being guided by reason, characterized by decision-making that aligns with evidence and logical principles.
arXiv Detail & Related papers (2024-06-01T01:17:25Z) - Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models [62.96551299003463]
We propose textbftextitThought Propagation (TP) to enhance the complex reasoning ability of Large Language Models.
TP first prompts LLMs to propose and solve a set of analogous problems that are related to the input one.
TP reuses the results of analogous problems to directly yield a new solution or derive a knowledge-intensive plan for execution to amend the initial solution obtained from scratch.
arXiv Detail & Related papers (2023-10-06T01:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.