RAGGED: Towards Informed Design of Scalable and Stable RAG Systems
- URL: http://arxiv.org/abs/2403.09040v3
- Date: Wed, 16 Jul 2025 09:39:02 GMT
- Title: RAGGED: Towards Informed Design of Scalable and Stable RAG Systems
- Authors: Jennifer Hsia, Afreen Shaikh, Zhiruo Wang, Graham Neubig,
- Abstract summary: Retrieval-augmented generation (RAG) enhances language models by integrating external knowledge.<n>RAGGED is a framework for systematically evaluating RAG systems.
- Score: 51.171355532527365
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-augmented generation (RAG) enhances language models by integrating external knowledge, but its effectiveness is highly dependent on system configuration. Improper retrieval settings can degrade performance, making RAG less reliable than closed-book generation. In this work, we introduce RAGGED, a framework for systematically evaluating RAG systems across diverse retriever-reader configurations, retrieval depths, and datasets. Our analysis reveals that reader robustness to noise is the key determinant of RAG stability and scalability. Some readers benefit from increased retrieval depth, while others degrade due to their sensitivity to distracting content. Through large-scale experiments on open-domain, multi-hop, and specialized-domain datasets, we show that retrievers, rerankers, and prompts influence performance but do not fundamentally alter these reader-driven trends. By providing a principled framework and new metrics to assess RAG stability and scalability, RAGGED enables systematic evaluation of retrieval-augmented generation systems, guiding future research on optimizing retrieval depth and model robustness.
Related papers
- A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions [1.4931265249949528]
Retrieval-Augmented Generation (RAG) is a major advancement in natural language processing (NLP)<n>RAG combines large language models (LLMs) with information retrieval systems to enhance factual grounding, accuracy, and contextual relevance.<n>This paper presents a systematic review of RAG, tracing its evolution from early developments in open domain question answering to recent state-of-the-art implementations.
arXiv Detail & Related papers (2025-07-25T03:05:46Z) - Knowing You Don't Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing [4.874077691069634]
Retrieval Augmented Generation (RAG) has shown strong capability in enhancing language models' knowledge and reducing AI generative hallucinations.<n>Current multi-round RAG systems may continue searching even when enough information has already been retrieved.<n>This paper introduces a new framework, SIM-RAG, to explicitly enhance RAG systems' self-awareness and multi-round retrieval capabilities.
arXiv Detail & Related papers (2025-05-05T17:39:35Z) - MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation [8.950307082012763]
Retrieval-Augmented Generation (RAG) has gained prominence as an effective method for enhancing the generative capabilities of Large Language Models (LLMs)
We present MIRAGE, a Question Answering dataset specifically designed for RAG evaluation.
MIRAGE consists of 7,560 curated instances mapped to a retrieval pool of 37,800 entries, enabling an efficient and precise evaluation of both retrieval and generation tasks.
arXiv Detail & Related papers (2025-04-23T23:05:46Z) - Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning Argumentations [65.11348389219887]
We introduce Dialectic-RAG (DRAG), a modular approach that evaluates retrieved information by comparing, contrasting, and resolving conflicting perspectives.
We show the impact of our framework both as an in-context learning strategy and for constructing demonstrations to instruct smaller models.
arXiv Detail & Related papers (2025-04-07T06:55:15Z) - Enhancing Retrieval-Augmented Generation: A Study of Best Practices [16.246719783032436]
We develop advanced RAG system designs that incorporate query expansion, various novel retrieval strategies, and a novel Contrastive In-Context Learning RAG.
Our study systematically investigates key factors, including language model size, prompt design, document chunk size, knowledge base size, retrieval stride, query expansion techniques, and Focus Mode retrieving relevant context at sentence-level.
Our findings offer actionable insights for developing RAG systems, striking a balance between contextual richness and retrieval-generation efficiency.
arXiv Detail & Related papers (2025-01-13T15:07:55Z) - Re-ranking the Context for Multimodal Retrieval Augmented Generation [28.63893944806149]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge to generate a response within a context.
RAG systems face unique challenges: (i) the retrieval process may select irrelevant entries to user query (e.g., images, documents), and (ii) vision-language models or multi-modal language models like GPT-4o may hallucinate when processing these entries to generate RAG output.
We show that by using a more advanced relevancy measure, one can enhance the retrieval process by selecting more relevant pieces from the knowledge-base and eliminate
arXiv Detail & Related papers (2025-01-08T18:58:22Z) - Unanswerability Evaluation for Retrieval Augmented Generation [74.3022365715597]
UAEval4RAG is a framework designed to evaluate whether RAG systems can handle unanswerable queries effectively.
We define a taxonomy with six unanswerable categories, and UAEval4RAG automatically synthesizes diverse and challenging queries.
arXiv Detail & Related papers (2024-12-16T19:11:55Z) - Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs [23.357843519762483]
Recent studies have demonstrated that leveraging the Retrieval-Augmented Generation framework, combined with Knowledge Graphs, robustly enhances the reasoning capabilities of Large language models.
We introduce a Multi-objective Multi-Armed Bandit enhanced RAG framework, supported by multiple retrieval methods with diverse capabilities.
Our method significantly outperforms baseline methods in non-stationary settings while achieving state-of-the-art performance in stationary environments.
arXiv Detail & Related papers (2024-12-10T15:56:03Z) - Semantic Tokens in Retrieval Augmented Generation [0.0]
I propose a novel Comparative RAG system that introduces an evaluator module to bridge the gap between probabilistic RAG systems and deterministically verifiable responses.<n>This framework paves the way for more reliable and scalable question-answering applications in domains requiring high precision and verifiability.
arXiv Detail & Related papers (2024-12-03T16:52:06Z) - RARe: Retrieval Augmented Retrieval with In-Context Examples [40.963703726988946]
We introduce a simple approach to enable retrievers to use in-context examples.
RARE finetunes a pre-trained model with in-context examples whose query is semantically similar to the target query.
We find RARe exhibits stronger out-of-domain generalization compared to models using queries without in-context examples.
arXiv Detail & Related papers (2024-10-26T05:46:20Z) - SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval [40.17823569905232]
Retrieval-Augmented Generation (RAG) has greatly improved large language models (LLMs) by enabling them to generate accurate, contextually grounded responses.
RAG approaches, which prioritize top-ranked documents based solely on query-context relevance, often introduce redundancy and conflicting information.
We propose Selection using Matrices for Augmented Retrieval (RAG) in question answering tasks, a fully unsupervised and training-free framework designed to optimize context selection in RAG.
arXiv Detail & Related papers (2024-09-21T03:03:09Z) - Trustworthiness in Retrieval-Augmented Generation Systems: A Survey [59.26328612791924]
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs)
We propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy.
arXiv Detail & Related papers (2024-09-16T09:06:44Z) - SFR-RAG: Towards Contextually Faithful LLMs [57.666165819196486]
Retrieval Augmented Generation (RAG) is a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance.
We introduce SFR-RAG, a small LLM that is instruction-textual with an emphasis on context-grounded generation and hallucination.
We also present ConBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks.
arXiv Detail & Related papers (2024-09-16T01:08:18Z) - WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain [2.8514947506989707]
We present a comprehensive framework to generate a domain relevant RAG benchmark.
Our framework is based on automatic question-answer generation with Human (domain experts)-AI Large Language Model (LLM) teaming.
We demonstrate the framework by introducing WeQA, a first-of-its-kind benchmark on the wind energy domain.
arXiv Detail & Related papers (2024-08-21T17:43:11Z) - RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation [61.14660526363607]
We propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for both the retrieval and generation modules.
RAGChecker has significantly better correlations with human judgments than other evaluation metrics.
The metrics of RAGChecker can guide researchers and practitioners in developing more effective RAG systems.
arXiv Detail & Related papers (2024-08-15T10:20:54Z) - RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [69.4501863547618]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.
With a focus on factual accuracy, we propose three novel metrics Completeness, Hallucination, and Irrelevance.
Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - Introducing a new hyper-parameter for RAG: Context Window Utilization [0.0]
RAG systems enhance generative models by incorporating relevant information retrieved from external knowledge bases.
The size of the text chunks retrieved and processed is a critical factor influencing RAG performance.
This study aims to identify the optimal chunk size that maximizes answer generation quality.
arXiv Detail & Related papers (2024-07-29T08:38:14Z) - FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research [70.6584488911715]
retrieval-augmented generation (RAG) has attracted considerable research attention.
Existing RAG toolkits are often heavy and inflexibly, failing to meet the customization needs of researchers.
Our toolkit has implemented 16 advanced RAG methods and gathered and organized 38 benchmark datasets.
arXiv Detail & Related papers (2024-05-22T12:12:40Z) - CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models [49.16989035566899]
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources.
This paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios.
arXiv Detail & Related papers (2024-01-30T14:25:32Z) - Corrective Retrieval Augmented Generation [36.04062963574603]
Retrieval-augmented generation (RAG) relies heavily on relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong.
We propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation.
CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches.
arXiv Detail & Related papers (2024-01-29T04:36:39Z) - Learning to Retrieve In-Context Examples for Large Language Models [69.9707552694766]
Large language models (LLMs) have demonstrated their ability to learn in-context.
The effectiveness of in-context learning is heavily reliant on the quality of the selected examples.
We propose a novel framework to iteratively train dense retrievers that can identify high-quality in-context examples.
arXiv Detail & Related papers (2023-07-14T05:23:08Z) - Generation-Augmented Retrieval for Open-domain Question Answering [134.27768711201202]
Generation-Augmented Retrieval (GAR) for answering open-domain questions.
We show that generating diverse contexts for a query is beneficial as fusing their results consistently yields better retrieval accuracy.
GAR achieves state-of-the-art performance on Natural Questions and TriviaQA datasets under the extractive QA setup when equipped with an extractive reader.
arXiv Detail & Related papers (2020-09-17T23:08:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.