Related papers: Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning

Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning

URL: http://arxiv.org/abs/2506.05278v1
Date: Thu, 05 Jun 2025 17:33:02 GMT
Title: Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning
Authors: Nan Huo, Jinyang Li, Bowen Qin, Ge Qu, Xiaolong Li, Xiaodong Li, Chenhao Ma, Reynold Cheng,
Abstract summary: Micro-Act is a framework with a hierarchical action space that automatically perceives context complexity and adaptively decomposes each knowledge source into a sequence of fine-grained comparisons.<n>It consistently achieves significant increase in QA accuracy over state-of-the-art baselines across all 5 datasets and 3 conflict types.<n>Micro-Act exhibits robust performance on non-conflict questions simultaneously, highlighting its practical value in real-world RAG applications.
Score: 20.29920864389664
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval-Augmented Generation (RAG) systems commonly suffer from Knowledge Conflicts, where retrieved external knowledge contradicts the inherent, parametric knowledge of large language models (LLMs). It adversely affects performance on downstream tasks such as question answering (QA). Existing approaches often attempt to mitigate conflicts by directly comparing two knowledge sources in a side-by-side manner, but this can overwhelm LLMs with extraneous or lengthy contexts, ultimately hindering their ability to identify and mitigate inconsistencies. To address this issue, we propose Micro-Act a framework with a hierarchical action space that automatically perceives context complexity and adaptively decomposes each knowledge source into a sequence of fine-grained comparisons. These comparisons are represented as actionable steps, enabling reasoning beyond the superficial context. Through extensive experiments on five benchmark datasets, Micro-Act consistently achieves significant increase in QA accuracy over state-of-the-art baselines across all 5 datasets and 3 conflict types, especially in temporal and semantic types where all baselines fail significantly. More importantly, Micro-Act exhibits robust performance on non-conflict questions simultaneously, highlighting its practical value in real-world RAG applications.

Related papers

MAGIC: A Multi-Hop and Graph-Based Benchmark for Inter-Context Conflicts in Retrieval-Augmented Generation [4.177310099979434]
Knowledge conflict often arises in RAG systems, where retrieved documents may be inconsistent with one another or contradict the model's parametric knowledge.<n>We propose a knowledge graph (KG)-based framework that generates varied and subtle conflicts between two similar yet distinct contexts.<n> Experimental results on our benchmark, MAGIC, provide intriguing insights into the inner workings of LLMs regarding knowledge conflict.
arXiv Detail & Related papers (2025-07-29T07:19:49Z)
FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation [37.28571879699906]
Large language models (LLMs) augmented with retrieval systems have demonstrated significant potential in handling knowledge-intensive tasks.<n>This paper proposes FaithfulRAG, a novel framework that resolves knowledge conflicts by explicitly modeling discrepancies between the models parametric knowledge and retrieved context.
arXiv Detail & Related papers (2025-06-10T16:02:54Z)
What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models [16.41477610681199]
Large language models frequently rely on both contextual input and parametric knowledge to perform tasks.<n>These sources can come into conflict, especially when retrieved documents contradict the model's parametric beliefs.<n>We propose a diagnostic framework to systematically evaluate LLM behavior under context-memory conflict.
arXiv Detail & Related papers (2025-06-06T19:20:23Z)
Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models [23.37800506729006]
We propose MMKC-Bench, a benchmark to evaluate factual knowledge conflicts in both context-memory and inter-context scenarios.<n> MMKC-Bench includes 1,573 knowledge instances and 3,381 images across 23 broad types, collected through automated pipelines with human verification.<n>Our findings show that while current LMMs are capable of recognizing knowledge conflicts, they tend to favor internal parametric knowledge over external evidence.
arXiv Detail & Related papers (2025-05-26T04:39:30Z)
Retrieval-Augmented Generation with Conflicting Evidence [57.66282463340297]
Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses.<n>In practice, these systems often need to handle ambiguous user queries and potentially conflicting information from multiple sources.<n>We propose RAMDocs (Retrieval with Ambiguity and Misinformation in Documents), a new dataset that simulates complex and realistic scenarios for conflicting evidence for a user query.
arXiv Detail & Related papers (2025-04-17T16:46:11Z)
Exploring LLM Reasoning Through Controlled Prompt Variations [0.9217021281095907]
We evaluate how well state-of-the-art models maintain logical consistency and correctness when confronted with four categories of prompt perturbations.<n>Our experiments, conducted on thirteen open-source and closed-source LLMs, reveal that introducing irrelevant context within the model's context window significantly degrades performance.<n>Certain perturbations inadvertently trigger chain-of-thought-like reasoning behaviors, even without explicit prompting.
arXiv Detail & Related papers (2025-04-02T20:18:50Z)
Contradiction Detection in RAG Systems: Evaluating LLMs as Context Validators for Improved Information Consistency [0.6827423171182154]
Retrieval Augmented Generation (RAG) systems have emerged as a powerful method for enhancing large language models (LLMs) with up-to-date information.<n>RAG can sometimes surface documents containing contradictory information, particularly in rapidly evolving domains such as news.<n>This study presents a novel data generation framework to simulate different types of contradictions that may occur in the retrieval stage of a RAG system.
arXiv Detail & Related papers (2025-03-31T19:41:15Z)
Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.<n>Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.<n>We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z)
mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA [78.45521005703958]
multimodal Retrieval-Augmented Generation (mRAG) is naturally introduced to provide MLLMs with comprehensive and up-to-date knowledge. We propose a novel framework called textbfRetrieval-textbfReftextbfAugmented textbfGeneration (mR$2$AG) which achieves adaptive retrieval and useful information localization. mR$2$AG significantly outperforms state-of-the-art MLLMs on INFOSEEK and Encyclopedic-VQA
arXiv Detail & Related papers (2024-11-22T16:15:50Z)
Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment [56.87031484108484]
Large Language Models (LLMs) are increasingly recognized for their practical applications. Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs. By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs.
arXiv Detail & Related papers (2024-11-09T15:12:28Z)
AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge [57.66282463340297]
Knowledge conflict arises from discrepancies between information in the context of a large language model and the knowledge stored in its parameters.<n>We propose a fine-grained, instance-level approach called AdaCAD, which dynamically infers the weight of adjustment based on the degree of conflict.<n>We show that ADACAD consistently outperforms other decoding baselines with average QA accuracy gains of 14.21% (absolute) over a static contrastive baseline, and improves the factuality of summaries by 6.19 (AlignScore)
arXiv Detail & Related papers (2024-09-11T16:35:18Z)
QUITO-X: A New Perspective on Context Compression from the Information Bottleneck Theory [66.01597794579568]
We introduce information bottleneck theory (IB) to model the problem.<n>We propose a cross-attention-based approach to approximate mutual information in IB.<n>Our method achieves a 25% increase in compression rate compared to the state-of-the-art.
arXiv Detail & Related papers (2024-08-20T02:44:45Z)
Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning [5.053086684547045]
This study introduces an in-context learning-based approach to enhance the reasoning capabilities of RALMs. Our approach increases accuracy in identifying unanswerable and conflicting scenarios without requiring additional fine-tuning.
arXiv Detail & Related papers (2024-08-08T12:42:43Z)
DYNAMICQA: Tracing Internal Knowledge Conflicts in Language Models [42.776896363518844]
We study the effect of intra-memory conflict on an LM's ability to accept relevant context. We utilize two knowledge conflict measures and a novel dataset containing inherently conflicting data, DynamicQA. We verify that LMs exhibit a greater degree of intra-memory conflict with dynamic facts compared to facts that have a single truth value.
arXiv Detail & Related papers (2024-07-24T06:06:07Z)
LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities. If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information. To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.