MAGIC: A Multi-Hop and Graph-Based Benchmark for Inter-Context Conflicts in Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2507.21544v1
- Date: Tue, 29 Jul 2025 07:19:49 GMT
- Title: MAGIC: A Multi-Hop and Graph-Based Benchmark for Inter-Context Conflicts in Retrieval-Augmented Generation
- Authors: Jungyeon Lee, Kangmin Lee, Taeuk Kim,
- Abstract summary: Knowledge conflict often arises in RAG systems, where retrieved documents may be inconsistent with one another or contradict the model's parametric knowledge.<n>We propose a knowledge graph (KG)-based framework that generates varied and subtle conflicts between two similar yet distinct contexts.<n> Experimental results on our benchmark, MAGIC, provide intriguing insights into the inner workings of LLMs regarding knowledge conflict.
- Score: 4.177310099979434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge conflict often arises in retrieval-augmented generation (RAG) systems, where retrieved documents may be inconsistent with one another or contradict the model's parametric knowledge. Existing benchmarks for investigating the phenomenon have notable limitations, including a narrow focus on the question answering setup, heavy reliance on entity substitution techniques, and a restricted range of conflict types. To address these issues, we propose a knowledge graph (KG)-based framework that generates varied and subtle conflicts between two similar yet distinct contexts, while ensuring interpretability through the explicit relational structure of KGs. Experimental results on our benchmark, MAGIC, provide intriguing insights into the inner workings of LLMs regarding knowledge conflict: both open-source and proprietary models struggle with conflict detection -- especially when multi-hop reasoning is required -- and often fail to pinpoint the exact source of contradictions. Finally, we present in-depth analyses that serve as a foundation for improving LLMs in integrating diverse, sometimes even conflicting, information.
Related papers
- DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs [36.47787866482107]
Retrieval Augmented Generation (RAG) is a commonly used approach for enhancing large language models.<n>We propose a novel taxonomy of knowledge conflict types in RAG, along with the desired model behavior for each type.<n>We then introduce CONFLICTS, a high-quality benchmark with expert annotations of conflict types in a realistic RAG setting.
arXiv Detail & Related papers (2025-06-10T06:52:57Z) - What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models [16.41477610681199]
Large language models frequently rely on both contextual input and parametric knowledge to perform tasks.<n>These sources can come into conflict, especially when retrieved documents contradict the model's parametric beliefs.<n>We propose a diagnostic framework to systematically evaluate LLM behavior under context-memory conflict.
arXiv Detail & Related papers (2025-06-06T19:20:23Z) - Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models [23.37800506729006]
We propose MMKC-Bench, a benchmark to evaluate factual knowledge conflicts in both context-memory and inter-context scenarios.<n> MMKC-Bench includes 1,573 knowledge instances and 3,381 images across 23 broad types, collected through automated pipelines with human verification.<n>Our findings show that while current LMMs are capable of recognizing knowledge conflicts, they tend to favor internal parametric knowledge over external evidence.
arXiv Detail & Related papers (2025-05-26T04:39:30Z) - Conflicts in Texts: Data, Implications and Challenges [58.03478157713084]
Conflicts could reflect the complexity of situations, changes that need to be explained and dealt with, difficulties in data annotation, and mistakes in generated outputs.<n>This survey categorizes these conflicts into three key areas: (1) natural texts on the web, where factual inconsistencies, subjective biases, and multiple perspectives introduce contradictions; (2) human-annotated data, where annotator disagreements, mistakes, and societal biases impact model training; and (3) model interactions, where hallucinations and knowledge conflicts emerge during deployment.<n>We highlight key challenges and future directions for developing conflict-aware NLP systems that can reason over and reconcile conflicting information more effectively
arXiv Detail & Related papers (2025-04-28T04:24:01Z) - Retrieval-Augmented Generation with Conflicting Evidence [57.66282463340297]
Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses.<n>In practice, these systems often need to handle ambiguous user queries and potentially conflicting information from multiple sources.<n>We propose RAMDocs (Retrieval with Ambiguity and Misinformation in Documents), a new dataset that simulates complex and realistic scenarios for conflicting evidence for a user query.
arXiv Detail & Related papers (2025-04-17T16:46:11Z) - Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning Argumentations [65.11348389219887]
We introduce Dialectic-RAG (DRAG), a modular approach that evaluates retrieved information by comparing, contrasting, and resolving conflicting perspectives.<n>We show the impact of our framework both as an in-context learning strategy and for constructing demonstrations to instruct smaller models.
arXiv Detail & Related papers (2025-04-07T06:55:15Z) - SegSub: Evaluating Robustness to Knowledge Conflicts and Hallucinations in Vision-Language Models [6.52323086990482]
Vision language models (VLM) demonstrate sophisticated multimodal reasoning yet are prone to hallucination when confronted with knowledge conflicts.<n>This research introduces segsub, a framework for applying targeted image perturbations to investigate VLM resilience against knowledge conflicts.
arXiv Detail & Related papers (2025-02-19T00:26:38Z) - Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs [55.74117540987519]
This paper explores the problem of commonsense level vision-knowledge conflict in Multimodal Large Language Models (MLLMs)<n>We introduce an automated framework, augmented with human-in-the-loop quality control, to generate inputs designed to simulate and evaluate these conflicts in MLLMs.<n>Using this framework, we have crafted a diagnostic benchmark consisting of 374 original images and 1,122 high-quality question-answer pairs.
arXiv Detail & Related papers (2024-10-10T17:31:17Z) - ECon: On the Detection and Resolution of Evidence Conflicts [56.89209046429291]
The rise of large language models (LLMs) has significantly influenced the quality of information in decision-making systems.
This study introduces a method for generating diverse, validated evidence conflicts to simulate real-world misinformation scenarios.
arXiv Detail & Related papers (2024-10-05T07:41:17Z) - AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge [57.66282463340297]
Knowledge conflict arises from discrepancies between information in the context of a large language model and the knowledge stored in its parameters.<n>We propose a fine-grained, instance-level approach called AdaCAD, which dynamically infers the weight of adjustment based on the degree of conflict.<n>We show that ADACAD consistently outperforms other decoding baselines with average QA accuracy gains of 14.21% (absolute) over a static contrastive baseline, and improves the factuality of summaries by 6.19 (AlignScore)
arXiv Detail & Related papers (2024-09-11T16:35:18Z) - ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM [36.332500824079844]
Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts has rarely been studied.
We present ConflictBank, the first comprehensive benchmark developed to evaluate knowledge conflicts from three aspects.
Our investigation delves into four model families and twelve LLM instances, meticulously analyzing conflicts stemming from misinformation, temporal discrepancies, and semantic divergences.
arXiv Detail & Related papers (2024-08-22T02:33:13Z) - Resolving Knowledge Conflicts in Large Language Models [46.903549751371415]
Large language models (LLMs) often encounter knowledge conflicts.
We ask what are the desiderata for LLMs when a knowledge conflict arises and whether existing LLMs fulfill them.
We introduce an evaluation framework for simulating contextual knowledge conflicts.
arXiv Detail & Related papers (2023-10-02T06:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.