Fugu-MT 論文翻訳(概要): SWE-Manager: Selecting and Synthesizing Golden Proposals Before Coding

論文の概要: SWE-Manager: Selecting and Synthesizing Golden Proposals Before Coding

arxiv url: http://arxiv.org/abs/2601.22956v1
Date: Fri, 30 Jan 2026 13:17:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-02 18:28:15.465228
Title: SWE-Manager: Selecting and Synthesizing Golden Proposals Before Coding
Title（参考訳）: SWE-Manager:コーディングの前に黄金の提案を選択し、合成する
Authors: Boyin Tan, Haoning Deng, Junyuan Zhang, Junjielong Xu, Pinjia He, Youcheng Sun,
Abstract要約: ソフトウェアエンジニアリングでは、チームは問題を修正するための複数の候補提案をドラフトし、次に、実装のための黄金の提案を慎重に行う。この選択は、問題のスコープ、影響、緊急性を評価するだけでなく、各提案の長所と短所を明確に理解する必要がある。 SWE-Managerは,最高の提案を選択し,黄金の提案を合成する共同選択合成手法である。
参考スコア（独自算出の注目度）: 17.083968760174507
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model (LLM) research in software engineering has largely focused on tasks such as code generation and bug repair. In practice, teams often draft multiple candidate proposals for fixing an issue and then deliberate on one golden proposal for implementation. This selection requires not only assessing the issue's scope, impact, and urgency, but also a clear understanding of each proposal's strengths and weaknesses. A good selection could make issue resolution more reliable while reducing regression and operational risk, whereas a poor choice can increase risk and even cause unpredictable failures. We first conduct a manual study of real-world issues to characterize the rationales maintainers use when selecting among competing proposals. Motivated by these findings, we introduce SWE-Manager, a joint selection and synthesis approach that selects the best proposal and synthesizes a golden proposal. SWE-Manager is an 8B model trained via reinforcement learning (RL) to compare proposals, justify its choice, and synthesize a golden proposal for implementation. We view proposal selection as a reasoning task, mirroring how technical managers review competing proposals by weighing issue context and each proposal's solution without executing code or running tests. On the SWE-Lancer Manager benchmark, SWE-Manager achieves 53.21 selection accuracy and 57.75 earn rate, earning 152,750 dollars and outperforming strong baselines including GPT-5. To further evaluate the effectiveness of SWE-Manager in real-world issue resolution, we design the P2A framework, which simulates a real-world workflow where multiple proposals are drafted, reviewed, and a golden proposal is selected for implementation ...
Abstract（参考訳）: ソフトウェア工学における大規模言語モデル(LLM)の研究は、コード生成やバグ修正といったタスクに重点を置いている。実際には、チームは問題を修正するために複数の候補の提案をドラフトし、それから実装のための黄金の提案を慎重に行うことが多い。この選択は、問題のスコープ、影響、緊急性を評価するだけでなく、各提案の長所と短所を明確に理解する必要がある。優れた選択によって、リグレッションや運用上のリスクを低減しつつ、イシュー解決をより信頼性の高いものにすることが可能になります。まず実世界の課題を手動で調査し、保守担当者が競合する提案の中から選択する際に使用する合理性を特徴付ける。これらの知見に触発されたSWE-Managerは,最高の提案を選定し,黄金の提案を合成する共同選択合成手法である。 SWE-Managerは、強化学習(RL)を通じてトレーニングされた8Bモデルで、提案を比較し、その選択を正当化し、実装のための黄金の提案を合成する。我々は、提案の選択を推論タスクとみなし、コードを実行したりテストを実行したりすることなく、課題コンテキストと各提案のソリューションを評価することによって、技術マネージャが競合する提案をどのようにレビューするかを反映しています。 SWE-Lancer Managerのベンチマークでは、SWE-Managerは53.21の選択精度と57.75の獲得率を獲得し、152,750ドルを獲得し、GPT-5を含む強力なベースラインを上回っている。実世界の課題解決におけるSWE-Managerの有効性をさらに評価するために、複数の提案をドラフトし、レビューし、黄金のプロポーザルを選択した実世界のワークフローをシミュレートするP2Aフレームワークを設計した。

論文の概要: SWE-Manager: Selecting and Synthesizing Golden Proposals Before Coding

関連論文リスト