Evaluation of Retrieval-Augmented Generation: A Survey
- URL: http://arxiv.org/abs/2405.07437v2
- Date: Wed, 3 Jul 2024 04:59:32 GMT
- Title: Evaluation of Retrieval-Augmented Generation: A Survey
- Authors: Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, Zhaofeng Liu,
- Abstract summary: We provide a comprehensive overview of the evaluation and benchmarks of Retrieval-Augmented Generation (RAG) systems.
Specifically, we examine and compare several quantifiable metrics of the Retrieval and Generation components, such as relevance, accuracy, and faithfulness.
We then analyze the various datasets and metrics, discuss the limitations of current benchmarks, and suggest potential directions to advance the field of RAG benchmarks.
- Score: 13.633909177683462
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Retrieval-Augmented Generation (RAG) has recently gained traction in natural language processing. Numerous studies and real-world applications are leveraging its ability to enhance generative models through external information retrieval. Evaluating these RAG systems, however, poses unique challenges due to their hybrid structure and reliance on dynamic knowledge sources. To better understand these challenges, we conduct A Unified Evaluation Process of RAG (Auepora) and aim to provide a comprehensive overview of the evaluation and benchmarks of RAG systems. Specifically, we examine and compare several quantifiable metrics of the Retrieval and Generation components, such as relevance, accuracy, and faithfulness, within the current RAG benchmarks, encompassing the possible output and ground truth pairs. We then analyze the various datasets and metrics, discuss the limitations of current benchmarks, and suggest potential directions to advance the field of RAG benchmarks.
Related papers
- Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.
Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z) - OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain [62.89809156574998]
We introduce an omnidirectional and automatic RAG benchmark, OmniEval, in the financial domain.
Our benchmark is characterized by its multi-dimensional evaluation framework.
Our experiments demonstrate the comprehensiveness of OmniEval, which includes extensive test datasets.
arXiv Detail & Related papers (2024-12-17T15:38:42Z) - Unanswerability Evaluation for Retrieval Augmented Generation [74.3022365715597]
UAEval4RAG is a framework designed to evaluate whether RAG systems can handle unanswerable queries effectively.
We define a taxonomy with six unanswerable categories, and UAEval4RAG automatically synthesizes diverse and challenging queries.
arXiv Detail & Related papers (2024-12-16T19:11:55Z) - CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity [23.48167670445722]
Retrieval-Augmented Generation (RAG) aims to generate more accurate and reliable answers with the help of the retrieved context from external knowledge sources.
evaluating these systems remains a crucial research area due to the following issues.
We propose a Comprehensive Full-chain Evaluation (CoFE-RAG) framework to facilitate thorough evaluation across the entire RAG pipeline.
arXiv Detail & Related papers (2024-10-16T05:20:32Z) - WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain [2.8514947506989707]
We present a comprehensive framework to generate a domain relevant RAG benchmark.
Our framework is based on automatic question-answer generation with Human (domain experts)-AI Large Language Model (LLM) teaming.
We demonstrate the framework by introducing WeQA, a first-of-its-kind benchmark on the wind energy domain.
arXiv Detail & Related papers (2024-08-21T17:43:11Z) - RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation [61.14660526363607]
We propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for both the retrieval and generation modules.
RAGChecker has significantly better correlations with human judgments than other evaluation metrics.
The metrics of RAGChecker can guide researchers and practitioners in developing more effective RAG systems.
arXiv Detail & Related papers (2024-08-15T10:20:54Z) - RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [69.4501863547618]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.
With a focus on factual accuracy, we propose three novel metrics Completeness, Hallucination, and Irrelevance.
Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - BERGEN: A Benchmarking Library for Retrieval-Augmented Generation [26.158785168036662]
Retrieval-Augmented Generation allows to enhance Large Language Models with external knowledge.
Inconsistent benchmarking poses a major challenge in comparing approaches and understanding the impact of each component in the pipeline.
In this work, we study best practices that lay the groundwork for a systematic evaluation of RAG and present BERGEN, an end-to-end library for reproducible research standardizing RAG experiments.
arXiv Detail & Related papers (2024-07-01T09:09:27Z) - CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models [49.16989035566899]
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources.
This paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios.
arXiv Detail & Related papers (2024-01-30T14:25:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.