Fugu-MT 論文翻訳(概要): TOM-SWE: User Mental Modeling For Software Engineering Agents

論文の概要: TOM-SWE: User Mental Modeling For Software Engineering Agents

arxiv url: http://arxiv.org/abs/2510.21903v1
Date: Fri, 24 Oct 2025 16:09:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 15:28:14.700111
Title: TOM-SWE: User Mental Modeling For Software Engineering Agents
Title（参考訳）: TOM-SWE: ソフトウェアエンジニアリングエージェントのためのユーザメンタルモデリング
Authors: Xuhui Zhou, Valerie Chen, Zora Zhiruo Wang, Graham Neubig, Maarten Sap, Xingyao Wang,
Abstract要約: ToM-SWEは、プライマリ・ソフトウェア・エンジニアリング(SWE)エージェントとライトウェイト・オブ・ミンド(ToM)パートナーエージェントを組み合わせたデュアルエージェントアーキテクチャである。 ToM-SWEは、ユーザー目標、制約、およびインストラクションとインタラクション履歴から好みを推測する。 2つのソフトウェアエンジニアリングベンチマークでは、ToM-SWEはタスクの成功率とユーザの満足度を改善する。
参考スコア（独自算出の注目度）: 75.28749912645127
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in coding agents have made them capable of planning, editing, running, and testing complex code bases. Despite their growing ability in coding tasks, these systems still struggle to infer and track user intent, especially when instructions are underspecified or context-dependent. To bridge this gap, we introduce ToM-SWE, a dual-agent architecture that pairs a primary software-engineering (SWE) agent with a lightweight theory-of-mind (ToM) partner agent dedicated to modeling the user's mental state. The ToM agent infers user goals, constraints, and preferences from instructions and interaction history, maintains a \textbf{persistent memory} of the user, and provides user-related suggestions to the SWE agent. In two software engineering benchmarks (ambiguous SWE-bench and stateful SWE-bench), ToM-SWE improves task success rates and user satisfaction. Notably, on the stateful SWE benchmark, a newly introduced evaluation that provides agents with a user simulator along with previous interaction histories, ToM-SWE achieves a substantially higher task success rate of 59.7\% compared to 18.1\% for OpenHands, a state-of-the-art SWE agent. Furthermore, in a three-week study with professional developers using ToM-SWE in their daily work, participants found it useful 86\% of the time, underscoring the value of stateful user modeling for practical coding agents.
Abstract（参考訳）: コーディングエージェントの最近の進歩により、複雑なコードベースの計画、編集、実行、テストが可能になった。コーディングタスクの能力の増大にもかかわらず、これらのシステムはユーザー意図の推測と追跡に苦慮している。このギャップを埋めるために、ユーザ精神状態のモデリングに特化した、プライマリ・ソフトウェア・エンジニアリング(SWE)エージェントと軽量な理論・オブ・ミンド(ToM)パートナエージェントを組み合わせたデュアルエージェントアーキテクチャであるToM-SWEを紹介する。 ToMエージェントは、命令やインタラクション履歴からユーザ目標、制約、嗜好を推測し、ユーザの‘textbf{persistent memory}’を維持し、SWEエージェントにユーザ関連の提案を提供する。 2つのソフトウェアエンジニアリングベンチマーク(あいまいなSWEベンチとステートフルなSWEベンチ)では、ToM-SWEはタスクの成功率とユーザの満足度を改善する。特に、最新のSWEベンチマークでは、エージェントに以前のインタラクション履歴と共にユーザーシミュレータを提供する新たな評価として、ToM-SWEは、最先端のSWEエージェントであるOpenHandsの18.1\%に対して、59.7\%のタスク成功率を達成した。さらに、ToM-SWEを日々の作業で使用するプロの開発者を対象に、3週間にわたる調査で、参加者は、実用的なコーディングエージェントに対するステートフルなユーザモデリングの価値を強調し、その86%の時間で有効であることが判明した。

論文の概要: TOM-SWE: User Mental Modeling For Software Engineering Agents

関連論文リスト