Retrieval-Augmented Generation with Conflicting Evidence
- URL: http://arxiv.org/abs/2504.13079v1
- Date: Thu, 17 Apr 2025 16:46:11 GMT
- Title: Retrieval-Augmented Generation with Conflicting Evidence
- Authors: Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal,
- Abstract summary: Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses.<n>In practice, these systems often need to handle ambiguous user queries and potentially conflicting information from multiple sources.<n>We propose RAMDocs (Retrieval with Ambiguity and Misinformation in Documents), a new dataset that simulates complex and realistic scenarios for conflicting evidence for a user query.
- Score: 57.66282463340297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses. However, in practice, these systems often need to handle ambiguous user queries and potentially conflicting information from multiple sources while also suppressing inaccurate information from noisy or irrelevant documents. Prior work has generally studied and addressed these challenges in isolation, considering only one aspect at a time, such as handling ambiguity or robustness to noise and misinformation. We instead consider multiple factors simultaneously, proposing (i) RAMDocs (Retrieval with Ambiguity and Misinformation in Documents), a new dataset that simulates complex and realistic scenarios for conflicting evidence for a user query, including ambiguity, misinformation, and noise; and (ii) MADAM-RAG, a multi-agent approach in which LLM agents debate over the merits of an answer over multiple rounds, allowing an aggregator to collate responses corresponding to disambiguated entities while discarding misinformation and noise, thereby handling diverse sources of conflict jointly. We demonstrate the effectiveness of MADAM-RAG using both closed and open-source models on AmbigDocs -- which requires presenting all valid answers for ambiguous queries -- improving over strong RAG baselines by up to 11.40% and on FaithEval -- which requires suppressing misinformation -- where we improve by up to 15.80% (absolute) with Llama3.3-70B-Instruct. Furthermore, we find that RAMDocs poses a challenge for existing RAG baselines (Llama3.3-70B-Instruct only obtains 32.60 exact match score). While MADAM-RAG begins to address these conflicting factors, our analysis indicates that a substantial gap remains especially when increasing the level of imbalance in supporting evidence and misinformation.
Related papers
- Contradiction Detection in RAG Systems: Evaluating LLMs as Context Validators for Improved Information Consistency [0.6827423171182154]
Retrieval Augmented Generation (RAG) systems have emerged as a powerful method for enhancing large language models (LLMs) with up-to-date information.<n>RAG can sometimes surface documents containing contradictory information, particularly in rapidly evolving domains such as news.<n>This study presents a novel data generation framework to simulate different types of contradictions that may occur in the retrieval stage of a RAG system.
arXiv Detail & Related papers (2025-03-31T19:41:15Z) - Agentic Verification for Ambiguous Query Disambiguation [42.238086712267396]
We tackle the challenge of disambiguating queries in retrieval-augmented generation (RAG) to diverse yet answerable interpretations.<n>We propose a joint approach to unify diversification with verification by incorporating feedback from retriever and generator early on.<n>We validate the efficiency and effectiveness of our method on the widely adopted ASQA benchmark to achieve diverse yet verifiable interpretations.
arXiv Detail & Related papers (2025-02-14T18:31:39Z) - Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies [66.30619782227173]
Large language models (LLMs) can produce erroneous responses that sound fluent and convincing.<n>We identify several features of LLM responses that shape users' reliance.<n>We find that explanations increase reliance on both correct and incorrect responses.<n>We observe less reliance on incorrect responses when sources are provided or when explanations exhibit inconsistencies.
arXiv Detail & Related papers (2025-02-12T16:35:41Z) - Parallel Key-Value Cache Fusion for Position Invariant RAG [55.9809686190244]
Large Language Models (LLMs) are sensitive to the position of relevant information within contexts.<n>We introduce a framework that generates consistent outputs for decoder-only models, irrespective of the input context order.
arXiv Detail & Related papers (2025-01-13T17:50:30Z) - MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation [34.66546005629471]
Large Language Models (LLMs) are essential tools for various natural language processing tasks but often suffer from generating outdated or incorrect information.
Retrieval-Augmented Generation (RAG) addresses this issue by incorporating external, real-time information retrieval to ground LLM responses.
To tackle this problem, we propose Multi-Agent Filtering Retrieval-Augmented Generation (MAIN-RAG)
MAIN-RAG is a training-free RAG framework that leverages multiple LLM agents to collaboratively filter and score retrieved documents.
arXiv Detail & Related papers (2024-12-31T08:07:26Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.
This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.
Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - Eliciting Critical Reasoning in Retrieval-Augmented Language Models via Contrastive Explanations [4.697267141773321]
Retrieval-augmented generation (RAG) has emerged as a critical mechanism in contemporary NLP to support Large Language Models (LLMs) in systematically accessing richer factual context.
Recent studies have shown that LLMs still struggle to critically analyse RAG-based in-context information, a limitation that may lead to incorrect inferences and hallucinations.
In this paper, we investigate how to elicit critical reasoning in RAG via contrastive explanations.
arXiv Detail & Related papers (2024-10-30T10:11:53Z) - ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions [52.33835101586687]
Large Language Models (LLMs) are widely used in Conversational AI systems to generate responses to user inquiries.<n>We propose a guided hallucination-based method to efficiently generate a diverse set of out-of-scope questions from a given document corpus.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - SFR-RAG: Towards Contextually Faithful LLMs [57.666165819196486]
Retrieval Augmented Generation (RAG) is a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance.
We introduce SFR-RAG, a small LLM that is instruction-textual with an emphasis on context-grounded generation and hallucination.
We also present ConBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks.
arXiv Detail & Related papers (2024-09-16T01:08:18Z) - Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG)
Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection.
It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z) - Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise [14.38859858538404]
In a retrieved document set, even the "relevant" documents may contain misleading or incorrect information.
Our work investigates a more challenging scenario in which even the "relevant" documents may contain misleading or incorrect information.
We propose approaches for handling knowledge conflicts among retrieved documents by explicitly fine-tuning a discriminator or prompting GPT-3.5 to elicit its discriminative capability.
arXiv Detail & Related papers (2023-05-02T16:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.