Fugu-MT 論文翻訳(概要): MapSatisfyBench: Benchmarking Satisfaction-Aware Map Agents through Behavior-Grounded Implicit Decision Factors

論文の概要: MapSatisfyBench: Benchmarking Satisfaction-Aware Map Agents through Behavior-Grounded Implicit Decision Factors

arxiv url: http://arxiv.org/abs/2606.17453v2
Date: Wed, 17 Jun 2026 07:21:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-18 13:57:35.215199
Title: MapSatisfyBench: Benchmarking Satisfaction-Aware Map Agents through Behavior-Grounded Implicit Decision Factors
Title（参考訳）: MapSatisfyBench: 行動中心のインシシデント決定因子による満足度対応マップエージェントのベンチマーク
Authors: Lubin Bai, Mengyu Cao, Sixue Wang, Zhongwei Wan, Yue Pan, Jiale Hou, Xiang Li, Xiuyuan Zhang,
Abstract要約: マップサービスは、プロフェッショナルなタスク設定ではなく、日々のシナリオに埋め込まれています。ユーザは多くの場合、自分のニーズを非公式に表現する。有能なエージェントは、まずこれらの要因を利用可能な情報ソースから積極的に回収する必要がある。ファクタは、それがユーザの受け入れに影響を与える場合にのみ評価可能であり、応答する前にエージェントが利用可能な情報から復元できる。
参考スコア（独自算出の注目度）: 18.24708264824734
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model agents are increasingly integrated into map services. Since map services are embedded in everyday-life scenarios rather than professional task settings, users often express their needs informally, resulting in underspecified queries with many unspoken needs, namely, implicit decision factors that are critical for user satisfaction. Although clarification is an effective way to mitigate this issue, it increases user burden in daily interaction, and a capable agent should first proactively recover such factors from available information sources. However, evaluating this ability is challenging. The first challenge is to determine which implicit decision factors are suitable for evaluation. A factor is evaluable only if it affects user acceptance and can be recovered from information available to the agent before it responds. Second, user satisfaction cannot be reliably represented by a single reference answer, requiring a benchmark that converts satisfaction-relevant factors into objective and quantifiable evaluation targets. To address these challenges, we propose a restore-identify-filter framework that reconstructs complete user needs from behavior-chain evidence, identifies implicit decision factors, and retains only those supported by pre-query evidence. Building on this methodology, we construct MapSatisfyBench from large-scale, real-world anonymized user data and annotate ground truth from five dimensions and enables full-chain evaluation of satisfaction-aware map agents. Experiments show that current agents generally perform well on explicit task completion, but remain limited in satisfying implicit decision factors and proactively acquiring the evidence needed for satisfaction-aware decisions. These findings establish MapSatisfyBench as a benchmark for shifting map-agent evaluation from task completion toward satisfaction-aware spatial decision making.
Abstract（参考訳）: 大規模言語モデルエージェントはマップサービスに統合されつつある。マップサービスは、プロのタスク設定よりも日常的なシナリオに埋め込まれているため、ユーザは、しばしば非公式にニーズを表現する。明確化は、この問題を軽減する効果的な方法であるが、日々のインタラクションにおけるユーザの負担を増大させ、有能なエージェントはまず、利用可能な情報ソースからそのような要因を積極的に回収するべきである。しかし、この能力を評価することは難しい。最初の課題は、どの暗黙的な決定要因が評価に適しているかを決定することである。ファクタは、それがユーザの受け入れに影響を与える場合にのみ評価可能であり、応答する前にエージェントが利用可能な情報から復元できる。第2に、満足度関連因子を客観的かつ定量的な評価対象に変換するベンチマークを必要とするため、ユーザ満足度を単一の基準回答で確実に表現することはできない。これらの課題に対処するため,行動連鎖エビデンスから完全なユーザニーズを再構築し,暗黙的な決定要因を識別し,事前問い合わせエビデンスによって支持されるもののみを保持するリストア識別フィルタフレームワークを提案する。本手法に基づいて,大規模で実世界の匿名化されたユーザデータからMapSatisfyBenchを構築し,5次元から真実を注釈し,満足度を考慮した地図エージェントのフルチェーン評価を可能にする。実験によると、現在のエージェントは、通常、明示的なタスク完了においてよく機能するが、暗黙的な決定要因を満足させ、満足度に配慮した決定に必要な証拠を積極的に取得することに制限されている。これらの結果は,MapSatisfyBenchをタスク完了から満足度を考慮した空間的意思決定へマップエージェントの評価をシフトするためのベンチマークとして確立した。

論文の概要: MapSatisfyBench: Benchmarking Satisfaction-Aware Map Agents through Behavior-Grounded Implicit Decision Factors

関連論文リスト