Fugu-MT 論文翻訳(概要): Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

論文の概要: Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

arxiv url: http://arxiv.org/abs/2605.06647v1
Date: Thu, 07 May 2026 17:54:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:12.06866
Title: Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval
Title（参考訳）: 超知能検索エージェント:情報検索の次のフロンティア
Authors: Zeyu Yang, Qi Ma, Jason Chen, Anshumali Shrivastava,
Abstract要約: textitSuperIntelligent Retrieval Agent (SIRA)を紹介する。 SIRAは、複数ラウンド探索探索を単一のコーパス識別検索アクションに圧縮することができる。解釈可能で、トレーニング不要で、効率的でありながら、より高価なマルチラウンドサーチを超えることができる。
参考スコア（独自算出の注目度）: 25.731213365755234
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-augmented agents are increasingly the interface to large organizational knowledge bases, yet most still treat retrieval as a black box: they issue exploratory queries, inspect returned snippets, and iteratively reformulate until useful evidence emerges. This approach resembles how a newcomer searches an unfamiliar database rather than how an expert navigates it with strong priors about terminology and likely evidence, and results in unnecessary retrieval rounds, increased latency, and poor recall. We introduce \textit{SuperIntelligent Retrieval Agent} (SIRA), which defines \emph{superintelligence} in retrieval as the ability to compress multi-round exploratory search into a single corpus-discriminative retrieval action. SIRA does not merely ask what terms are relevant to the query; it asks which terms are likely to separate the desired evidence from corpus-level confusers. On the corpus side, an LLM enriches each document offline with missing search vocabulary; on the query side, it predicts evidence vocabulary omitted by the query; and document-frequency statistics as a tool call to filter proposed terms that are absent, overly common, or unlikely to create retrieval margin. The final retrieval step is a single weighted BM25 call combining the original query with the validated expansion. Across ten BEIR benchmarks and downstream question-answering tasks, SIRA achieves the significantly superior performance outperforming dense retrievers and state-of-the-art multi-round agentic baselines, demonstrating that one well-formed lexical query, guided by LLM cognition and lightweight corpus statistics, can exceed substantially more expensive multi-round search while remaining interpretable, training-free, and efficient.
Abstract（参考訳）: 検索可能なエージェントは、大規模な組織知識ベースへのインターフェースとしてますます多くなっているが、ほとんどのエージェントは、検索をブラックボックスとして扱い、探索的なクエリを発行し、返却スニペットを検査し、有用な証拠が現れるまで反復的に修正する。このアプローチは、専門家が専門用語や潜在的証拠に関する強い事前知識でそれをナビゲートし、不要な検索ラウンド、レイテンシの増加、リコールの低さをもたらす、という方法よりも、新参者が慣れていないデータベースを検索する方法に似ています。本稿では,検索における「emph{superintelligence}」を定義する「textit{SuperIntelligent Retrieval Agent} (SIRA)」を紹介する。 SIRAは単にクエリに関連する用語を尋ねるだけでなく、どの用語が目的の証拠をコーパスレベルのコンフューザーから切り離す可能性があるかを問う。コーパス側では、LLMは各文書をオフラインで強化し、検索語彙が不足している;クエリ側では、クエリによって省略されたエビデンスを予測;そしてドキュメント頻度統計は、提案された用語が欠落している、過度に一般的で、検索マージンを生成できない、というツールコールである。最後の検索ステップは、元のクエリと検証された拡張を組み合わせた単一の重み付きBM25コールである。 10個のBEIRベンチマークと下流問合せタスクにおいて、SIRAは、高密度検索と最先端のマルチラウンドエージェントベースラインよりもはるかに優れたパフォーマンスを実現し、LLM認識と軽量コーパス統計によって導かれる1つの十分に整形された語彙クエリが、解釈可能、トレーニング不要、効率を保ちながら、はるかに高価なマルチラウンド検索を超えることを実証した。

論文の概要: Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval

関連論文リスト