The Distracting Effect: Understanding Irrelevant Passages in RAG
- URL: http://arxiv.org/abs/2505.06914v1
- Date: Sun, 11 May 2025 09:25:05 GMT
- Title: The Distracting Effect: Understanding Irrelevant Passages in RAG
- Authors: Chen Amiraz, Florin Cuconasu, Simone Filice, Zohar Karnin,
- Abstract summary: We identify and use hard distracting passages to improve RAG systems.<n>We achieve up to a 7.5% increase in answering accuracy compared to counterparts fine-tuned on conventional RAG datasets.<n>Our contribution is two-fold: first, we move beyond the simple binary classification of irrelevant passages as either completely unrelated vs. distracting, and second, we develop and analyze multiple methods for finding hard distracting passages.
- Score: 8.882885336338205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A well-known issue with Retrieval Augmented Generation (RAG) is that retrieved passages that are irrelevant to the query sometimes distract the answer-generating LLM, causing it to provide an incorrect response. In this paper, we shed light on this core issue and formulate the distracting effect of a passage w.r.t. a query (and an LLM). We provide a quantifiable measure of the distracting effect of a passage and demonstrate its robustness across LLMs. Our research introduces novel methods for identifying and using hard distracting passages to improve RAG systems. By fine-tuning LLMs with these carefully selected distracting passages, we achieve up to a 7.5% increase in answering accuracy compared to counterparts fine-tuned on conventional RAG datasets. Our contribution is two-fold: first, we move beyond the simple binary classification of irrelevant passages as either completely unrelated vs. distracting, and second, we develop and analyze multiple methods for finding hard distracting passages. To our knowledge, no other research has provided such a comprehensive framework for identifying and utilizing hard distracting passages.
Related papers
- Injecting External Knowledge into the Reasoning Process Enhances Retrieval-Augmented Generation [26.1953598254707]
Retrieval-augmented generation (RAG) has been widely adopted to augment large language models (LLMs) with external knowledge for knowledge-intensive tasks.<n>RAG's effectiveness is often undermined by the presence of noisy (i.e., low-quality) retrieved passages.<n>We propose Passage Injection to enhance RAG's ability to recognize and resist noisy passages.
arXiv Detail & Related papers (2025-07-25T14:43:31Z) - PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning [57.89188317734747]
PrismRAG trains the model with distractor-aware QA pairs mixing gold evidence with subtle distractor passages.<n>It instills reasoning-centric habits that make the LLM plan, rationalize, and synthesize without relying on extensive human engineered instructions.
arXiv Detail & Related papers (2025-07-25T00:15:31Z) - Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs [69.10441885629787]
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge.<n>It falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts.<n>This survey synthesizes both strands under a unified reasoning-retrieval perspective.
arXiv Detail & Related papers (2025-07-13T03:29:41Z) - Do RAG Systems Suffer From Positional Bias? [13.06567550060387]
We show how state-of-the-art retrieval pipelines, while attempting to retrieve relevant passages, systematically bring highly distracting ones to the top ranks.<n>Our findings reveal that sophisticated strategies that attempt to rearrange the passages based on LLM positional preferences do not perform better than random shuffling.
arXiv Detail & Related papers (2025-05-21T14:18:01Z) - Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization [97.72503890388866]
We propose Self-Routing RAG (SR-RAG), a novel framework that binds selective retrieval with knowledge verbalization.<n>SR-RAG enables an LLM to dynamically decide between external retrieval and verbalizing its own parametric knowledge.<n>We introduce dynamic knowledge source inference via nearest neighbor search to improve the accuracy of knowledge source decision.
arXiv Detail & Related papers (2025-04-01T17:59:30Z) - U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack [9.760456105567078]
This paper introduces U-NIAH, a unified framework that systematically compares Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG)<n>Our framework incorporates multi-needle, long-needle, and needle-in-needle configurations, along with different retrieval settings.<n>Our findings show that RAG significantly enhances smaller LLMs by mitigating the "lost-in-the-middle" effect and improving robustness.
arXiv Detail & Related papers (2025-03-01T05:05:24Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.<n>This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.<n>Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models [11.716595438057997]
We propose passage-specific prompt tuning for reranking in open-domain question answering (PSPT)
PSPT is a parameter-efficient method that fine-tunes learnable passage-specific soft prompts.
We conducted extensive experiments utilizing the Llama-2-chat-7B model across three publicly available open-domain question answering datasets.
arXiv Detail & Related papers (2024-05-31T07:43:42Z) - Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG)
Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection.
It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z) - Modeling Uncertainty and Using Post-fusion as Fallback Improves Retrieval Augmented Generation with LLMs [80.74263278847063]
The integration of retrieved passages and large language models (LLMs) has significantly contributed to improving open-domain question answering.
This paper investigates different methods of combining retrieved passages with LLMs to enhance answer generation.
arXiv Detail & Related papers (2023-08-24T05:26:54Z) - Evidentiality-aware Retrieval for Overcoming Abstractiveness in
Open-Domain Question Answering [29.00167886463793]
We propose Evidentiality-Aware Passage Retrieval (EADPR) to learn to discriminate evidence passages from distractors.
We conduct extensive experiments to validate the effectiveness of our proposed method on multiple abstractive ODQA tasks.
arXiv Detail & Related papers (2023-04-06T12:42:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.