Fugu-MT 論文翻訳(概要): Test-Time Strategies for More Efficient and Accurate Agentic RAG

論文の概要: Test-Time Strategies for More Efficient and Accurate Agentic RAG

arxiv url: http://arxiv.org/abs/2603.12396v1
Date: Thu, 12 Mar 2026 19:18:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 17:38:11.73791
Title: Test-Time Strategies for More Efficient and Accurate Agentic RAG
Title（参考訳）: より効率的かつ高精度なエージェントRAGのためのテスト時間戦略
Authors: Brian Zhang, Deepti Guntur, Zhiyang Zuo, Abhinav Sharma, Shreyas Chaudhari, Wenlong Zhao, Franck Dernoncourt, Puneet Mathur, Ryan Rossi, Nedim Lipka,
Abstract要約: Retrieval-Augmented Generation (RAG) システムは複雑なマルチホップ問題に直面している。このような手法は、以前に処理された情報の反復的な検索を含む非効率性を導入することができる。本稿では,これらの問題を軽減するために,サーチ-R1パイプラインに対するテスト時間修正について検討する。
参考スコア（独自算出の注目度）: 58.44913384057518
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Retrieval-Augmented Generation (RAG) systems face challenges with complex, multihop questions, and agentic frameworks such as Search-R1 (Jin et al., 2025), which operates iteratively, have been proposed to address these complexities. However, such approaches can introduce inefficiencies, including repetitive retrieval of previously processed information and challenges in contextualizing retrieved results effectively within the current generation prompt. Such issues can lead to unnecessary retrieval turns, suboptimal reasoning, inaccurate answers, and increased token consumption. In this paper, we investigate test-time modifications to the Search-R1 pipeline to mitigate these identified shortcomings. Specifically, we explore the integration of two components and their combination: a contextualization module to better integrate relevant information from retrieved documents into reasoning, and a de-duplication module that replaces previously retrieved documents with the next most relevant ones. We evaluate our approaches using the HotpotQA (Yang et al., 2018) and the Natural Questions (Kwiatkowski et al., 2019) datasets, reporting the exact match (EM) score, an LLM-as-a-Judge assessment of answer correctness, and the average number of turns. Our best-performing variant, utilizing GPT-4.1-mini for contextualization, achieves a 5.6% increase in EM score and reduces the number of turns by 10.5% compared to the Search-R1 baseline, demonstrating improved answer accuracy and retrieval efficiency.
Abstract（参考訳）: Retrieval-Augmented Generation (RAG) システムは、複雑でマルチホップな質問や、反復的に動作する Search-R1 (Jin et al , 2025) のようなエージェント的なフレームワークで、これらの複雑さに対処するために提案されている。しかし、このような手法では、前処理した情報の繰り返し検索や、検索した結果を現在の生成プロンプト内で効果的にコンテキスト化する際の課題など、非効率性を導入することができる。このような問題は、不要な検索のターン、最適でない推論、不正確な回答、トークン消費の増加につながる可能性がある。本稿では,これらの欠点を緩和するために,サーチ-R1パイプラインに対するテスト時間修正について検討する。具体的には、検索した文書から推論に関連情報をよりよく統合するコンテキスト化モジュールと、検索した文書を最も関連性の高い文書に置き換える非重複モジュールである。 The HotpotQA (Yang et al , 2018) and the Natural Questions (Kwiatkowski et al , 2019) datasets (Kwiatkowski et al , 2019) datas, reported the exact Match (EM) score, a LLM-as-a-Judge Assessment of answer correctness and the average number of turn。 GPT-4.1-miniを文脈化に利用し,EMスコアが5.6%増加し,検索-R1ベースラインに比べてターン数が10.5%減少し,回答精度と検索効率が向上した。

論文の概要: Test-Time Strategies for More Efficient and Accurate Agentic RAG

関連論文リスト