Fugu-MT 論文翻訳(概要): MSRS: Evaluating Multi-Source Retrieval-Augmented Generation

論文の概要: MSRS: Evaluating Multi-Source Retrieval-Augmented Generation

arxiv url: http://arxiv.org/abs/2508.20867v1
Date: Thu, 28 Aug 2025 14:59:55 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-29 18:12:02.46549
Title: MSRS: Evaluating Multi-Source Retrieval-Augmented Generation
Title（参考訳）: MSRS:マルチソース検索拡張ジェネレーションの評価
Authors: Rohan Phanse, Yijie Zhou, Kejian Shi, Wencai Zhang, Yixin Liu, Yilun Zhao, Arman Cohan,
Abstract要約: 多くの現実世界のアプリケーションは、複数のソースにまたがる情報を統合して要約する能力を必要としている。本稿では、RAGシステムに対して異なるソース間で情報を統合するための評価ベンチマークを構築するためのスケーラブルなフレームワークを提案する。
参考スコア（独自算出の注目度）: 51.717139132190574
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-augmented systems are typically evaluated in settings where information required to answer the query can be found within a single source or the answer is short-form or factoid-based. However, many real-world applications demand the ability to integrate and summarize information scattered across multiple sources, where no single source is sufficient to respond to the user's question. In such settings, the retrieval component of a RAG pipeline must recognize a variety of relevance signals, and the generation component must connect and synthesize information across multiple sources. We present a scalable framework for constructing evaluation benchmarks that challenge RAG systems to integrate information across distinct sources and generate long-form responses. Using our framework, we build two new benchmarks on Multi-Source Retrieval and Synthesis: MSRS-Story and MSRS-Meet, representing narrative synthesis and summarization tasks, respectively, that require retrieval from large collections. Our extensive experiments with various RAG pipelines -- including sparse and dense retrievers combined with frontier LLMs -- reveal that generation quality is highly dependent on retrieval effectiveness, which varies greatly by task. While multi-source synthesis proves challenging even in an oracle retrieval setting, we find that reasoning models significantly outperform standard LLMs at this distinct step.
Abstract（参考訳）: Retrieval-augmentedシステムは通常、クエリに応答するために必要な情報が単一のソース内にあるか、あるいは応答がショートフォームまたはファクトイドベースであるような設定で評価される。しかし、現実世界のアプリケーションの多くは、複数のソースにまたがる情報を統合し、要約する機能を必要としている。このような設定では、RAGパイプラインの検索コンポーネントは様々な関連信号を認識し、生成コンポーネントは複数のソースにまたがる情報を接続して合成する必要がある。本稿では、RAGシステムに対して異なるソース間で情報を統合し、長文応答を生成するための評価ベンチマークを構築するためのスケーラブルなフレームワークを提案する。筆者らは,多ソース検索と合成に関する2つの新しいベンチマークを構築した。MSRS-StoryとMSRS-Meetは,大容量コレクションからの検索を必要とする物語合成タスクと要約タスクを表す。各種RAGパイプラインを用いた広範囲な実験により, 生成品質は, タスクによって大きく異なる検索効率に大きく依存していることが判明した。マルチソース合成は, オラクル検索においても困難であることが証明されているが, この異なる段階において, 推論モデルが標準LLMよりも著しく優れていることが判明した。

論文の概要: MSRS: Evaluating Multi-Source Retrieval-Augmented Generation

関連論文リスト