Fugu-MT 論文翻訳(概要): BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization

論文の概要: BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization

arxiv url: http://arxiv.org/abs/2512.23631v2
Date: Thu, 01 Jan 2026 00:11:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-05 13:15:27.679631
Title: BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization
Title（参考訳）: BOAD:バンド最適化による階層型ソフトウェアエンジニアリングエージェントの発見
Authors: Iris Xu, Guangtao Zeng, Zexue He, Charles Jin, Aldo Pareja, Dan Gutfreund, Chuang Gan, Zhang-Wei Hong,
Abstract要約: 大規模言語モデル(LLM)は、現実世界のソフトウェア工学の問題を一般化するのに苦労する。既存のシステムはワークフロー全体の問題を処理するために、単一のエージェントに依存することが多い。人間の技術者が複雑な問題を分解する方法に触発され、我々はSWEエージェントをオーケストラとして構成し、特殊なサブエージェントをコーディネートすることを提案する。
参考スコア（独自算出の注目度）: 41.08366028094234
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have shown strong reasoning and coding capabilities, yet they struggle to generalize to real-world software engineering (SWE) problems that are long-horizon and out of distribution. Existing systems often rely on a single agent to handle the entire workflow-interpreting issues, navigating large codebases, and implementing fixes-within one reasoning chain. Such monolithic designs force the model to retain irrelevant context, leading to spurious correlations and poor generalization. Motivated by how human engineers decompose complex problems, we propose structuring SWE agents as orchestrators coordinating specialized sub-agents for sub-tasks such as localization, editing, and validation. The challenge lies in discovering effective hierarchies automatically: as the number of sub-agents grows, the search space becomes combinatorial, and it is difficult to attribute credit to individual sub-agents within a team. We address these challenges by formulating hierarchy discovery as a multi-armed bandit (MAB) problem, where each arm represents a candidate sub-agent and the reward measures its helpfulness when collaborating with others. This framework, termed Bandit Optimization for Agent Design (BOAD), enables efficient exploration of sub-agent designs under limited evaluation budgets. On SWE-bench-Verified, BOAD outperforms single-agent and manually designed multi-agent systems. On SWE-bench-Live, featuring more recent and out-of-distribution issues, our 36B system ranks second on the leaderboard at the time of evaluation, surpassing larger models such as GPT-4 and Claude. These results demonstrate that automatically discovered hierarchical multi-agent systems significantly improve generalization on challenging long-horizon SWE tasks. Code is available at https://github.com/iamxjy/BOAD-SWE-Agent.
Abstract（参考訳）: 大規模言語モデル(LLM)は、強い推論能力とコーディング能力を示してきたが、長期にわたって分散していない現実世界のソフトウェア工学(SWE)問題への一般化に苦慮している。既存のシステムは、ワークフローを解釈する問題全体を処理し、大規模なコードベースをナビゲートし、ひとつの推論チェーンで修正を実装するために、単一のエージェントに依存していることが多い。このようなモノリシックな設計は、モデルに無関係な文脈を維持するよう強制する。人間の技術者が複雑な問題を分解する方法に触発され、我々は、局所化、編集、検証などのサブタスクのための特別なサブエージェントをコーディネートするオーケストレータとしてSWEエージェントを構築することを提案する。サブエージェントの数が増加するにつれて、検索空間は結合的になり、チーム内の個々のサブエージェントにクレジットを割り当てるのは困難である。我々は,階層探索をマルチアーム・バンディット(MAB)問題として定式化し,各アームが候補サブエージェントを表現し,報酬が他者と協調する際の有用性を測定することで,これらの課題に対処する。 Bandit Optimization for Agent Design (BOAD)と呼ばれるこのフレームワークは、限られた評価予算の下で、サブエージェント設計の効率的な探索を可能にする。 SWE-bench-Verifiedでは、BOADはシングルエージェントと手動で設計されたマルチエージェントシステムより優れている。 SWE-bench-Liveでは、より最近の流通の問題を取り上げ、評価時に36Bシステムはリーダーボードで2位となり、GPT-4やClaudeといった大型モデルを上回っています。これらの結果から, 階層型マルチエージェントシステムの自動検出により, 長期SWEタスクの一般化が著しく向上することが示唆された。コードはhttps://github.com/iamxjy/BOAD-SWE-Agent.comで入手できる。

論文の概要: BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization

関連論文リスト