Related papers: QE-RAG: A Robust Retrieval-Augmented Generation Benchmark for Query Entry Errors

QE-RAG: A Robust Retrieval-Augmented Generation Benchmark for Query Entry Errors

URL: http://arxiv.org/abs/2504.04062v1
Date: Sat, 05 Apr 2025 05:24:08 GMT
Title: QE-RAG: A Robust Retrieval-Augmented Generation Benchmark for Query Entry Errors
Authors: Kepu Zhang, Zhongxiang Sun, Weijie Yu, Xiaoxue Zang, Kai Zheng, Yang Song, Han Li, Jun Xu,
Abstract summary: Retriever-augmented generation (RAG) has become a widely adopted approach for enhancing the factual accuracy of large language models (LLMs)<n>QE-RAG is the first robust RAG benchmark designed specifically to evaluate performance against query entry errors.<n>We propose a contrastive learning-based robust retriever training method and a retrieval-augmented query correction method.
Score: 23.225358970952197
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retriever-augmented generation (RAG) has become a widely adopted approach for enhancing the factual accuracy of large language models (LLMs). While current benchmarks evaluate the performance of RAG methods from various perspectives, they share a common assumption that user queries used for retrieval are error-free. However, in real-world interactions between users and LLMs, query entry errors such as keyboard proximity errors, visual similarity errors, and spelling errors are frequent. The impact of these errors on current RAG methods against such errors remains largely unexplored. To bridge this gap, we propose QE-RAG, the first robust RAG benchmark designed specifically to evaluate performance against query entry errors. We augment six widely used datasets by injecting three common types of query entry errors into randomly selected user queries at rates of 20\% and 40\%, simulating typical user behavior in real-world scenarios. We analyze the impact of these errors on LLM outputs and find that corrupted queries degrade model performance, which can be mitigated through query correction and training a robust retriever for retrieving relevant documents. Based on these insights, we propose a contrastive learning-based robust retriever training method and a retrieval-augmented query correction method. Extensive in-domain and cross-domain experiments reveal that: (1) state-of-the-art RAG methods including sequential, branching, and iterative methods, exhibit poor robustness to query entry errors; (2) our method significantly enhances the robustness of RAG when handling query entry errors and it's compatible with existing RAG methods, further improving their robustness.

Related papers

Out of Style: RAG's Fragility to Linguistic Variation [29.59506089890902]
User queries exhibit greater linguistic variations and can trigger cascading errors across interdependent RAG components. We analyze how varying four linguistic dimensions (formality, readability, politeness, and grammatical correctness) impact RAG performance.
arXiv Detail & Related papers (2025-04-11T03:30:26Z)
Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence [56.09494651178128]
Retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG)<n>We show that retrievers often rely on superficial patterns like over-prioritizing document beginnings, shorter documents, repeated entities, and literal matches.<n>We show that these biases have direct consequences for downstream applications like RAG, where retrieval-preferred documents can mislead LLMs.
arXiv Detail & Related papers (2025-03-06T23:23:13Z)
Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals [3.9139847342664864]
We introduce RAGuard, a fact-checking dataset designed to evaluate the robustness of RAG systems against misleading retrievals.<n> RAGuard categorizes retrieved evidence into three types: supporting, misleading, and irrelevant.<n>Our benchmark experiments reveal that when exposed to misleading retrievals, all tested LLM-powered RAG systems perform worse than their zero-shot baselines.
arXiv Detail & Related papers (2025-02-22T05:50:15Z)
Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control [52.405085773954596]
Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to mitigate large language model hallucinations.<n>Existing RAG frameworks often apply retrieval indiscriminately,leading to inefficiencies-over-retrieving.<n>We introduce a novel user-controllable RAG framework that enables dynamic adjustment of the accuracy-cost trade-off.
arXiv Detail & Related papers (2025-02-17T18:56:20Z)
Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation [18.098228823748617]
We present Interrogation Attack (IA), a membership inference technique targeting documents in the RAG datastore.<n>We demonstrate successful inference with just 30 queries while remaining stealthy.<n>We observe a 2x improvement in TPR@1%FPR over prior inference attacks across diverse RAG configurations.
arXiv Detail & Related papers (2025-02-01T04:01:18Z)
Toward Optimal Search and Retrieval for RAG [39.69494982983534]
Retrieval-augmented generation (RAG) is a promising method for addressing some of the memory-related challenges associated with Large Language Models (LLMs) Here, we work towards the goal of understanding how retrievers can be optimized for RAG pipelines for common tasks such as Question Answering (QA)
arXiv Detail & Related papers (2024-11-11T22:06:51Z)
JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance. We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods. In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z)
Optimizing Query Generation for Enhanced Document Retrieval in RAG [53.10369742545479]
Large Language Models (LLMs) excel in various language tasks but they often generate incorrect information. Retrieval-Augmented Generation (RAG) aims to mitigate this by using document retrieval for accurate responses.
arXiv Detail & Related papers (2024-07-17T05:50:32Z)
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval [54.54576644403115]
We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents.<n>Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding.<n>We show that incorporating explicit reasoning about the query improves retrieval performance by up to 12.2 points.
arXiv Detail & Related papers (2024-07-16T17:58:27Z)
Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese [3.724862061593193]
The prevailing issue of factual inconsistency errors in conventional Retrieval Augmented Generation (RAG) motivates the study of Factual Consistency Evaluation (FCE) We propose the first comprehensive FCE benchmark emphFace4RAG for RAG independent of the underlying Large Language Models (LLMs) On the proposed benchmark, we discover the failure of existing FCE methods to detect the logical fallacy, which refers to a mismatch of logic structures between the answer and the retrieved reference.
arXiv Detail & Related papers (2024-07-01T08:35:04Z)
LoL: A Comparative Regularization Loss over Query Reformulation Losses for Pseudo-Relevance Feedback [70.44530794897861]
Pseudo-relevance feedback (PRF) has proven to be an effective query reformulation technique to improve retrieval accuracy. Existing PRF methods independently treat revised queries originating from the same query but using different numbers of feedback documents. We propose the Loss-over-Loss (LoL) framework to compare the reformulation losses between different revisions of the same query during training.
arXiv Detail & Related papers (2022-04-25T10:42:50Z)
Factual Error Correction for Abstractive Summaries Using Entity Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process. RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary. Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.