Related papers: CC-VQA: Conflict- and Correlation-Aware Method for Mitigating Knowledge Conflict in Knowledge-Based Visual Question Answering

CC-VQA: Conflict- and Correlation-Aware Method for Mitigating Knowledge Conflict in Knowledge-Based Visual Question Answering

URL: http://arxiv.org/abs/2602.23952v1
Date: Fri, 27 Feb 2026 11:56:26 GMT
Title: CC-VQA: Conflict- and Correlation-Aware Method for Mitigating Knowledge Conflict in Knowledge-Based Visual Question Answering
Authors: Yuyang Hong, Jiaqi Gu, Yujin Lou, Lubin Fan, Qi Yang, Ying Wang, Kun Ding, Yue Wu, Shiming Xiang, Jieping Ye,
Abstract summary: Knowledge-based visual question answering (KB-VQA) demonstrates significant potential for handling knowledge-intensive tasks.<n>Conflicts arise between static parametric knowledge in vision language models and dynamically retrieved information.<n>We propose textbfCC-VQA as a training-free, conflict- and correlation-aware method for KB-VQA.
Score: 53.7094431951084
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge-based visual question answering (KB-VQA) demonstrates significant potential for handling knowledge-intensive tasks. However, conflicts arise between static parametric knowledge in vision language models (VLMs) and dynamically retrieved information due to the static model knowledge from pre-training. The outputs either ignore retrieved contexts or exhibit inconsistent integration with parametric knowledge, posing substantial challenges for KB-VQA. Current knowledge conflict mitigation methods primarily adapted from language-based approaches, focusing on context-level conflicts through engineered prompting strategies or context-aware decoding mechanisms. However, these methods neglect the critical role of visual information in conflicts and suffer from redundant retrieved contexts, which impair accurate conflict identification and effective mitigation. To address these limitations, we propose \textbf{CC-VQA}: a novel training-free, conflict- and correlation-aware method for KB-VQA. Our method comprises two core components: (1) Vision-Centric Contextual Conflict Reasoning, which performs visual-semantic conflict analysis across internal and external knowledge contexts; and (2) Correlation-Guided Encoding and Decoding, featuring positional encoding compression for low-correlation statements and adaptive decoding using correlation-weighted conflict scoring. Extensive evaluations on E-VQA, InfoSeek, and OK-VQA benchmarks demonstrate that CC-VQA achieves state-of-the-art performance, yielding absolute accuracy improvements of 3.3\% to 6.4\% compared to existing methods. Code is available at https://github.com/cqu-student/CC-VQA.

Related papers

Diagnosing Knowledge Conflict in Multimodal Long-Chain Reasoning [78.86309644343295]
Multimodal large language models (MLLMs) in long chain-of-thought reasoning often fail when different knowledge sources provide conflicting signals.<n>We formalize these failures under a unified notion of knowledge conflict, distinguishing input-level objective conflict from process-level effective conflict.<n>Our findings provide a mechanism-level view of multimodal reasoning under knowledge conflict and enable principled diagnosis and control of long-CoT failures.
arXiv Detail & Related papers (2026-02-16T07:10:44Z)
That's Deprecated! Understanding, Detecting, and Steering Knowledge Conflicts in Language Models for Code Generation [55.78914774437411]
Large language models (LLMs) behave when faced with discrepancies between their parametric knowledge and conflicting information contained in a prompt.<n>We propose a domain-agnostic framework for constructing and interpreting such conflicts.<n>We show that activation-level steering can achieve up to a textbf12.6% improvement in steering success over a random baseline.
arXiv Detail & Related papers (2025-10-21T22:27:56Z)
CoCoA: Confidence and Context-Aware Adaptive Decoding for Resolving Knowledge Conflicts in Large Language Models [24.693047847053023]
CoCoA (Confidence- and Context-Aware Adaptive Decoding) is a novel token-level algorithm for principled conflict resolution and enhanced faithfulness.<n>CoCoA resolves conflict by utilizing confidence-aware measures (entropy gap and contextual peakedness) and the generalized divergence between the parametric and contextual distributions.
arXiv Detail & Related papers (2025-08-25T05:06:04Z)
Conflict-Aware Soft Prompting for Retrieval-Augmented Generation [13.671410389511498]
Retrieval-augmented generation (RAG) enhances the capabilities of large language models (LLMs) by incorporating external knowledge into their input prompts.<n>RAG often fails to resolve the conflict between incorrect external context and correct parametric knowledge, known as context-memory conflict.<n>We introduce Conflict-Aware REtrieval-Augmented Generation (CARE), consisting of a context assessor and a base LLM.<n>CARE effectively mitigates context-memory conflicts, leading to an average performance gain of 5.0% on QA and fact-checking benchmarks.
arXiv Detail & Related papers (2025-08-21T05:36:29Z)
Insight Over Sight: Exploring the Vision-Knowledge Conflicts in Multimodal LLMs [55.74117540987519]
This paper explores the problem of commonsense level vision-knowledge conflict in Multimodal Large Language Models (MLLMs)<n>We introduce an automated framework, augmented with human-in-the-loop quality control, to generate inputs designed to simulate and evaluate these conflicts in MLLMs.<n>Using this framework, we have crafted a diagnostic benchmark consisting of 374 original images and 1,122 high-quality question-answer pairs.
arXiv Detail & Related papers (2024-10-10T17:31:17Z)
Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models [33.76903352835436]
Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities for capturing and reasoning over multimodal inputs. These models are prone to parametric knowledge conflicts, which arise from inconsistencies of represented knowledge between their vision and language components. We present a systematic approach to detect, interpret, and mitigate them.
arXiv Detail & Related papers (2024-10-04T17:59:28Z)
AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge [57.66282463340297]
Knowledge conflict arises from discrepancies between information in the context of a large language model and the knowledge stored in its parameters.<n>We propose a fine-grained, instance-level approach called AdaCAD, which dynamically infers the weight of adjustment based on the degree of conflict.<n>We show that ADACAD consistently outperforms other decoding baselines with average QA accuracy gains of 14.21% (absolute) over a static contrastive baseline, and improves the factuality of summaries by 6.19 (AlignScore)
arXiv Detail & Related papers (2024-09-11T16:35:18Z)
Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z)
Discerning and Resolving Knowledge Conflicts through Adaptive Decoding with Contextual Information-Entropy Constraint [20.543282448771336]
We propose an adaptive decoding method to discern whether the knowledge conflicts occur and resolve them. Experiments show that COIECD exhibits strong performance and robustness over knowledge conflicts in realistic datasets.
arXiv Detail & Related papers (2024-02-19T07:10:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.