Fugu-MT 論文翻訳(概要): InteractComp: Evaluating Search Agents With Ambiguous Queries

論文の概要: InteractComp: Evaluating Search Agents With Ambiguous Queries

arxiv url: http://arxiv.org/abs/2510.24668v1
Date: Tue, 28 Oct 2025 17:35:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-29 15:35:37.309994
Title: InteractComp: Evaluating Search Agents With Ambiguous Queries
Title（参考訳）: InteractComp: あいまいなクエリによる検索エージェントの評価
Authors: Mingyi Deng, Lijun Huang, Yani Fan, Jiayi Zhang, Fashen Ren, Jinyi Bai, Fuzhen Yang, Dayi Miao, Zhaoyang Yu, Yifan Wu, Yanfei Zhang, Fengwei Teng, Yingjia Wan, Song Hu, Yude Li, Xin Jin, Conghao Hu, Haoyu Li, Qirui Fu, Tai Zhong, Xinyu Wang, Xiangru Tang, Nan Tang, Chenglin Wu, Yuyu Luo,
Abstract要約: 検索エージェントがクエリのあいまいさを認識でき、検索中に積極的に対話できるかどうかを評価するためのベンチマークであるInteractCompを紹介する。最高のモデルでは71.50%の完全コンテキストにもかかわらず、13.73%の精度しか達成していない。この停滞は、検索タスク固有の即時フィードバックと相まって、InteractCompは、検索エージェントのインタラクション機能の評価とトレーニングの両方に有用なリソースとなる。
参考スコア（独自算出の注目度）: 36.05005463045869
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language agents have demonstrated remarkable potential in web search and information retrieval. However, these search agents assume user queries are complete and unambiguous, an assumption that diverges from reality where users begin with incomplete queries requiring clarification through interaction. Yet most agents lack interactive mechanisms during the search process, and existing benchmarks cannot assess this capability. To address this gap, we introduce InteractComp, a benchmark designed to evaluate whether search agents can recognize query ambiguity and actively interact to resolve it during search. Following the principle of easy to verify, interact to disambiguate, we construct 210 expert-curated questions across 9 domains through a target-distractor methodology that creates genuine ambiguity resolvable only through interaction. Evaluation of 17 models reveals striking failure: the best model achieves only 13.73% accuracy despite 71.50% with complete context, exposing systematic overconfidence rather than reasoning deficits. Forced interaction produces dramatic gains, demonstrating latent capability current strategies fail to engage. Longitudinal analysis shows interaction capabilities stagnated over 15 months while search performance improved seven-fold, revealing a critical blind spot. This stagnation, coupled with the immediate feedback inherent to search tasks, makes InteractComp a valuable resource for both evaluating and training interaction capabilities in search agents. The code is available at https://github.com/FoundationAgents/InteractComp.
Abstract（参考訳）: 言語エージェントは、Web検索と情報検索において顕著な可能性を示している。しかし、これらの検索エージェントは、ユーザクエリが完全で曖昧であると仮定し、ユーザが対話を通じて明確化を必要とする不完全なクエリから始まる現実から逸脱する、という仮定である。しかし、ほとんどのエージェントは、検索プロセス中に対話的なメカニズムを欠いているため、既存のベンチマークでは、この能力を評価できない。このギャップに対処するために、検索エージェントがクエリのあいまいさを認識し、検索中に積極的に対話して解決できるかどうかを評価するためのベンチマークであるInteractCompを紹介した。検証し易く,あいまいさに対処する原則に従えば,9領域にわたる専門家による210の質問を,対話を通じてのみ解決可能な真のあいまいさを創出するターゲット・ディフラクタ手法を用いて構築する。最高のモデルでは、71.50%の完全コンテキストにもかかわらず、13.73%の精度しか達成せず、欠陥を推論するのではなく、体系的な過信を露呈している。強制的な相互作用は劇的に向上し、現在の戦略が関与できない潜在能力を示す。縦断解析によると、相互作用能力は15ヶ月にわたって停滞し、検索性能は7倍に向上し、致命的な盲点が明らかとなった。この停滞は、検索タスク固有の即時フィードバックと相まって、InteractCompは、検索エージェントのインタラクション機能の評価とトレーニングの両方に有用なリソースとなる。コードはhttps://github.com/FoundationAgents/InteractCompで公開されている。

論文の概要: InteractComp: Evaluating Search Agents With Ambiguous Queries

関連論文リスト