Related papers: RAGProbe: An Automated Approach for Evaluating RAG Applications

RAGProbe: An Automated Approach for Evaluating RAG Applications

URL: http://arxiv.org/abs/2409.19019v1
Date: Tue, 24 Sep 2024 23:33:07 GMT
Title: RAGProbe: An Automated Approach for Evaluating RAG Applications
Authors: Shangeetha Sivasothy, Scott Barnett, Stefanus Kurniawan, Zafaryab Rasool, Rajesh Vasa,
Abstract summary: Retrieval Augmented Generation (RAG) is increasingly being used when building Generative AI applications. We present a technique for generating variations in question-answer pairs to trigger failures in RAG pipelines.
Score: 1.38012307221604
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval Augmented Generation (RAG) is increasingly being used when building Generative AI applications. Evaluating these applications and RAG pipelines is mostly done manually, via a trial and error process. Automating evaluation of RAG pipelines requires overcoming challenges such as context misunderstanding, wrong format, incorrect specificity, and missing content. Prior works therefore focused on improving evaluation metrics as well as enhancing components within the pipeline using available question and answer datasets. However, they have not focused on 1) providing a schema for capturing different types of question-answer pairs or 2) creating a set of templates for generating question-answer pairs that can support automation of RAG pipeline evaluation. In this paper, we present a technique for generating variations in question-answer pairs to trigger failures in RAG pipelines. We validate 5 open-source RAG pipelines using 3 datasets. Our approach revealed the highest failure rates when prompts combine multiple questions: 91% for questions when spanning multiple documents and 78% for questions from a single document; indicating a need for developers to prioritise handling these combined questions. 60% failure rate was observed in academic domain dataset and 53% and 62% failure rates were observed in open-domain datasets. Our automated approach outperforms the existing state-of-the-art methods, by increasing the failure rate by 51% on average per dataset. Our work presents an automated approach for continuously monitoring the health of RAG pipelines, which can be integrated into existing CI/CD pipelines, allowing for improved quality.

Related papers

RAG Without the Lag: Interactive Debugging for Retrieval-Augmented Generation Pipelines [1.5741300187949614]
Retrieval-augmented generation (RAG) pipelines have become the de-facto approach for building AI assistants with access to external, domain-specific knowledge. RAGGY is a tool that combines a Python library of composable RAG primitives with an interactive interface for real-time debug.
arXiv Detail & Related papers (2025-04-18T09:38:49Z)
Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z)
MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation [34.66546005629471]
Large Language Models (LLMs) are essential tools for various natural language processing tasks but often suffer from generating outdated or incorrect information. Retrieval-Augmented Generation (RAG) addresses this issue by incorporating external, real-time information retrieval to ground LLM responses. To tackle this problem, we propose Multi-Agent Filtering Retrieval-Augmented Generation (MAIN-RAG) MAIN-RAG is a training-free RAG framework that leverages multiple LLM agents to collaboratively filter and score retrieved documents.
arXiv Detail & Related papers (2024-12-31T08:07:26Z)
Unanswerability Evaluation for Retrieval Augmented Generation [74.3022365715597]
UAEval4RAG is a framework designed to evaluate whether RAG systems can handle unanswerable queries effectively. We define a taxonomy with six unanswerable categories, and UAEval4RAG automatically synthesizes diverse and challenging queries.
arXiv Detail & Related papers (2024-12-16T19:11:55Z)
Toward Optimal Search and Retrieval for RAG [39.69494982983534]
Retrieval-augmented generation (RAG) is a promising method for addressing some of the memory-related challenges associated with Large Language Models (LLMs) Here, we work towards the goal of understanding how retrievers can be optimized for RAG pipelines for common tasks such as Question Answering (QA)
arXiv Detail & Related papers (2024-11-11T22:06:51Z)
RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions [52.33835101586687]
Conversational AI agents use Retrieval Augmented Generation (RAG) to provide verifiable document-grounded responses to user inquiries. This paper presents a novel synthetic data generation method to efficiently create a diverse set of context-grounded confusing questions from a given document corpus.
arXiv Detail & Related papers (2024-10-18T16:11:29Z)
REFINE on Scarce Data: Retrieval Enhancement through Fine-Tuning via Model Fusion of Embedding Models [14.023953508288628]
Retrieval augmented generation (RAG) pipelines are commonly used in tasks such as question-answering (QA) We propose REFINE, a novel technique that generates synthetic data from available documents and then uses a model fusion approach to fine-tune embeddings.
arXiv Detail & Related papers (2024-10-16T08:43:39Z)
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation [63.611024451010316]
Following natural instructions is crucial for the effective application of Retrieval-Augmented Generation (RAG) systems. We propose VIF-RAG, the first automated, scalable, and verifiable synthetic pipeline for instruction-following alignment in RAG systems.
arXiv Detail & Related papers (2024-10-12T16:30:51Z)
Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation [72.70046559930555]
We propose a generic RAG approach called Adaptive Note-Enhanced RAG (Adaptive-Note) for complex QA tasks. Specifically, Adaptive-Note introduces an overarching view of knowledge growth, iteratively gathering new information in the form of notes. In addition, we employ an adaptive, note-based stop-exploration strategy to decide "what to retrieve and when to stop" to encourage sufficient knowledge exploration.
arXiv Detail & Related papers (2024-10-11T14:03:29Z)
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [66.93260816493553]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios. With a focus on factual accuracy, we propose three novel metrics: Completeness, Hallucination, and Irrelevance. Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z)
RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation [42.82192656794179]
Large Language Models (LLMs) exhibit remarkable capabilities but are prone to generating inaccurate or hallucinatory responses. This limitation stems from their reliance on vast pretraining datasets, making them susceptible to errors in unseen scenarios. Retrieval-Augmented Generation (RAG) addresses this by incorporating external, relevant documents into the response generation process.
arXiv Detail & Related papers (2024-03-31T08:58:54Z)
Retrieval-Augmented Generation for AI-Generated Content: A Survey [38.50754568320154]
Retrieval-Augmented Generation (RAG) has emerged as a paradigm to address such challenges. RAG introduces the information retrieval process, which enhances the generation process by retrieving relevant objects from available data stores. In this paper, we comprehensively review existing efforts that integrate RAG technique into AIGC scenarios.
arXiv Detail & Related papers (2024-02-29T18:59:01Z)
RAG-Fusion: a New Take on Retrieval-Augmented Generation [0.0]
Infineon has identified a need for engineers, account managers, and customers to rapidly obtain product information. This research marks significant progress in artificial intelligence (AI) and natural language processing (NLP) applications.
arXiv Detail & Related papers (2024-01-31T22:06:07Z)
Q-DETR: An Efficient Low-Bit Quantized Detection Transformer [50.00784028552792]
We find that the bottlenecks of Q-DETR come from the query information distortion through our empirical analyses. We formulate our DRD as a bi-level optimization problem, which can be derived by generalizing the information bottleneck (IB) principle to the learning of Q-DETR. We introduce a new foreground-aware query matching scheme to effectively transfer the teacher information to distillation-desired features to minimize the conditional information entropy.
arXiv Detail & Related papers (2023-04-01T08:05:14Z)
Relation-Guided Pre-Training for Open-Domain Question Answering [67.86958978322188]
We propose a Relation-Guided Pre-Training (RGPT-QA) framework to solve complex open-domain questions. We show that RGPT-QA achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions.
arXiv Detail & Related papers (2021-09-21T17:59:31Z)
Generating Diverse and Consistent QA pairs from Contexts with Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts. Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.