Fugu-MT 論文翻訳(概要): SeekerGym: A Benchmark for Reliable Information Seeking

論文の概要: SeekerGym: A Benchmark for Reliable Information Seeking

arxiv url: http://arxiv.org/abs/2604.17143v1
Date: Sat, 18 Apr 2026 20:33:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.356682
Title: SeekerGym: A Benchmark for Reliable Information Seeking
Title（参考訳）: SeekerGym:信頼性の高い情報検索のためのベンチマーク
Authors: Remy Kim, Minseung Lee, Shuo Li, Osbert Bastani,
Abstract要約: 検索された情報のギャップは、与えられた情報が正確で関連性があるとしても、ユーザーを誤解させるバイアスを残すことができる。本稿では,AIエージェントが取得した情報の完全性を評価するためのベンチマークであるSeekerGymを紹介する。
参考スコア（独自算出の注目度）: 27.627357867641518
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite their substantial successes, AI agents continue to face fundamental challenges in terms of trustworthiness. Consider deep research agents, tasked with searching for information relevant to a given topic-while AI agents can perform effective information retrieval, there is little guarantee regarding the completeness of this information. Gaps in retrieved information can leave biases that mislead users even if the information they are given is correct and relevant. We introduce SeekerGym, a benchmark designed to evaluate the completeness of information retrieved by AI agents. In addition, SeekerGym also measures how well agents quantify their uncertainty in the completeness of their information; if an agent fails to retrieve all relevant information, it is useful for it to at least quantify how much might be missing. At a high level, each task in SeekerGym is a document (e.g., a Wikipedia article), and the AI agent must issue queries to retrieve passages from that document. Intuitively, the document comprehensively covers a topic, so the ability to retrieve its sections directly measures completeness of information retrieval. In addition to Wikipedia, we also consider machine learning survey papers, where the goal is to retrieve relevant sections of a survey paper. We benchmark several models and algorithms; the best approaches retrieve 42.5% of passages on Wikipedia and 29.2% on ML Surveys, leaving substantial room for improvement.
Abstract（参考訳）: その大きな成功にもかかわらず、AIエージェントは信頼性の観点からも根本的な課題に直面し続けている。与えられたトピックに関連する情報の検索を行うディープリサーチエージェントを考える。AIエージェントは効果的な情報検索を行うことができるが、この情報の完全性に関する保証はほとんどない。検索された情報のギャップは、与えられた情報が正確で関連性があるとしても、ユーザーを誤解させるバイアスを残すことができる。本稿では,AIエージェントが取得した情報の完全性を評価するためのベンチマークであるSeekerGymを紹介する。さらに、SeekerGymは、エージェントが情報の完全性においてどのように不確実性を定量化するかも測定する。高いレベルでは、SeekerGymの各タスクはドキュメント(例えばWikipediaの記事)であり、AIエージェントはそのドキュメントからパスを取得するためにクエリを発行しなければならない。直感的には、文書はトピックを包括的にカバーしているので、そのセクションを検索する機能は、情報検索の完全性を直接測定する。 Wikipediaに加えて、機械学習による調査論文も検討し、調査論文の関連部分を検索することを目的としている。最良のアプローチはWikipediaで42.5%、ML Surveysで29.2%のパスを取得し、改善の余地は残されている。

論文の概要: SeekerGym: A Benchmark for Reliable Information Seeking

関連論文リスト