MSRS: Evaluating Multi-Source Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2508.20867v1
- Date: Thu, 28 Aug 2025 14:59:55 GMT
- Title: MSRS: Evaluating Multi-Source Retrieval-Augmented Generation
- Authors: Rohan Phanse, Yijie Zhou, Kejian Shi, Wencai Zhang, Yixin Liu, Yilun Zhao, Arman Cohan,
- Abstract summary: Many real-world applications demand the ability to integrate and summarize information scattered across multiple sources.<n>We present a scalable framework for constructing evaluation benchmarks that challenge RAG systems to integrate information across distinct sources.
- Score: 51.717139132190574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-augmented systems are typically evaluated in settings where information required to answer the query can be found within a single source or the answer is short-form or factoid-based. However, many real-world applications demand the ability to integrate and summarize information scattered across multiple sources, where no single source is sufficient to respond to the user's question. In such settings, the retrieval component of a RAG pipeline must recognize a variety of relevance signals, and the generation component must connect and synthesize information across multiple sources. We present a scalable framework for constructing evaluation benchmarks that challenge RAG systems to integrate information across distinct sources and generate long-form responses. Using our framework, we build two new benchmarks on Multi-Source Retrieval and Synthesis: MSRS-Story and MSRS-Meet, representing narrative synthesis and summarization tasks, respectively, that require retrieval from large collections. Our extensive experiments with various RAG pipelines -- including sparse and dense retrievers combined with frontier LLMs -- reveal that generation quality is highly dependent on retrieval effectiveness, which varies greatly by task. While multi-source synthesis proves challenging even in an oracle retrieval setting, we find that reasoning models significantly outperform standard LLMs at this distinct step.
Related papers
- Recurrence Meets Transformers for Universal Multimodal Retrieval [59.92546492752452]
ReT-2 is a unified retrieval model that supports multimodal queries composed of both images and text.<n>We evaluate ReT-2 on the challenging M2KR and M-BEIR benchmarks across different retrieval configurations.<n>When integrated into retrieval-augmented generation pipelines, ReT-2 also improves downstream performance on Encyclopedic-VQA and InfoSeek datasets.
arXiv Detail & Related papers (2025-09-10T18:00:29Z) - DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router [57.28685457991806]
DeepSieve is an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router.<n>Our design emphasizes modularity, transparency, and adaptability, leveraging recent advances in agentic system design.
arXiv Detail & Related papers (2025-07-29T17:55:23Z) - Benchmarking Deep Search over Heterogeneous Enterprise Data [73.55304268238474]
We present a new benchmark for evaluating a form of retrieval-augmented generation (RAG)<n>RAG requires source-aware, multi-hop reasoning over diverse, sparsed, but related sources.<n>We build it using a synthetic data pipeline that simulates business across product planning, development, and support stages.
arXiv Detail & Related papers (2025-06-29T08:34:59Z) - HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation [11.53083922927901]
HM-RAG is a novel Hierarchical Multi-agent Multimodal RAG framework.<n>It pioneers collaborative intelligence for dynamic knowledge synthesis across structured, unstructured, and graph-based data.
arXiv Detail & Related papers (2025-04-13T06:55:33Z) - Multi-Head RAG: Solving Multi-Aspect Problems with LLMs [18.581518952488093]
Multi-Head RAG (MRAG) is a novel scheme for fetching multi-aspect documents.<n>We show MRAG's design advantages over 18 RAG baselines, empirical improvements of up to 20% in retrieval success ratios.
arXiv Detail & Related papers (2024-06-07T16:59:38Z) - STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Knowledge Bases.
Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine.
We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z) - FeB4RAG: Evaluating Federated Search in the Context of Retrieval
Augmented Generation [31.371489527686578]
Federated search systems aggregate results from multiple search engines, selecting appropriate sources to enhance result quality and align with user intent.
FEB4RAG is a novel dataset specifically designed for federated search within RAG frameworks.
arXiv Detail & Related papers (2024-02-19T07:06:52Z) - CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models [49.16989035566899]
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources.
This paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios.
arXiv Detail & Related papers (2024-01-30T14:25:32Z) - Synergistic Interplay between Search and Large Language Models for
Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections.
InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z) - Enhancing Multi-modal and Multi-hop Question Answering via Structured
Knowledge and Unified Retrieval-Generation [33.56304858796142]
Multi-modal multi-hop question answering involves answering a question by reasoning over multiple input sources from different modalities.
Existing methods often retrieve evidences separately and then use a language model to generate an answer based on the retrieved evidences.
We propose a Structured Knowledge and Unified Retrieval-Generation (RG) approach to address these issues.
arXiv Detail & Related papers (2022-12-16T18:12:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.