Fugu-MT 論文翻訳(概要): Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

論文の概要: Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

arxiv url: http://arxiv.org/abs/2603.26233v1
Date: Fri, 27 Mar 2026 09:56:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-30 21:49:48.432047
Title: Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents
Title（参考訳）: Ask or Assume? Uncerity-Aware Clarification-Seeking in Coding Agents
Authors: Nicholas Edwards, Sebastian Schuster,
Abstract要約: 大規模言語モデル(LLM)エージェントは、ソフトウェア工学のようなオープンなドメインにますますデプロイされています。我々は, SWE-bench Verified の未特定変種に対する LLM エージェントの解明と探索能力を評価する。コード実行から不特定性検出を明示的に分離する不確実性認識型マルチエージェントスキャフォールドを提案する。
参考スコア（独自算出の注目度）: 4.301199871195023
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software engineering, they frequently encounter underspecified instructions that lack crucial context. While human developers naturally resolve underspecification by asking clarifying questions, current agents are largely optimized for autonomous execution. In this work, we systematically evaluate the clarification-seeking abilities of LLM agents on an underspecified variant of SWE-bench Verified. We propose an uncertainty-aware multi-agent scaffold that explicitly decouples underspecification detection from code execution. Our results demonstrate that this multi-agent system using OpenHands + Claude Sonnet 4.5 achieves a 69.40% task resolve rate, significantly outperforming a standard single-agent setup (61.20%) and closing the performance gap with agents operating on fully specified instructions. Furthermore, we find that the multi-agent system exhibits well-calibrated uncertainty, conserving queries on simple tasks while proactively seeking information on more complex issues. These findings indicate that current models can be turned into proactive collaborators, where agents independently recognize when to ask questions to elicit missing information in real-world, underspecified tasks.
Abstract（参考訳）: 大きな言語モデル(LLM)エージェントは、ソフトウェアエンジニアリングのようなオープンなドメインにますますデプロイされるので、重要なコンテキストに欠ける不明確な命令に遭遇することが多い。人間の開発者は質問を明確にすることで不特定性を自然に解決するが、現在のエージェントは自律実行に最適化されている。本研究では, SWE-bench Verified の未特定変種に対する LLM エージェントの解明・探索能力について, 系統的に評価した。コード実行から不特定性検出を明示的に分離する不確実性認識型マルチエージェントスキャフォールドを提案する。その結果、OpenHands + Claude Sonnet 4.5 を用いたマルチエージェントシステムは、69.40%のタスク解決率を実現し、標準のシングルエージェントセットアップ(61.20%)を大幅に上回り、完全に指定された命令で操作するエージェントとの性能ギャップを埋めることを示した。さらに, マルチエージェントシステムでは, 複雑な問題に関する情報を積極的に求めながら, 単純なタスクに対するクエリを保存し, 精度の高い不確実性を示すことがわかった。これらの結果は、現在のモデルがプロアクティブなコラボレータに変換され、エージェントは、現実の未特定タスクにおいて欠落した情報を引き出すために、いつ質問をするかを独立に認識できることを示している。

論文の概要: Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

関連論文リスト