A Knowledge Noise Mitigation Framework for Knowledge-based Visual Question Answering
- URL: http://arxiv.org/abs/2509.09159v1
- Date: Thu, 11 Sep 2025 05:40:26 GMT
- Title: A Knowledge Noise Mitigation Framework for Knowledge-based Visual Question Answering
- Authors: Zhiyue Liu, Sihang Liu, Jinyuan Liu, Xinru Zhang,
- Abstract summary: Knowledge-based visual question answering (KB-VQA) requires a model to understand images and utilize external knowledge to provide accurate answers.<n>Existing approaches often directly augment models with retrieved information from knowledge sources.<n>We propose a training-free framework with knowledge focusing for KB-VQA, that mitigates the impact of noise by enhancing knowledge relevance and reducing redundancy.
- Score: 14.08940185497287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge-based visual question answering (KB-VQA) requires a model to understand images and utilize external knowledge to provide accurate answers. Existing approaches often directly augment models with retrieved information from knowledge sources while ignoring substantial knowledge redundancy, which introduces noise into the answering process. To address this, we propose a training-free framework with knowledge focusing for KB-VQA, that mitigates the impact of noise by enhancing knowledge relevance and reducing redundancy. First, for knowledge retrieval, our framework concludes essential parts from the image-question pairs, creating low-noise queries that enhance the retrieval of highly relevant knowledge. Considering that redundancy still persists in the retrieved knowledge, we then prompt large models to identify and extract answer-beneficial segments from knowledge. In addition, we introduce a selective knowledge integration strategy, allowing the model to incorporate knowledge only when it lacks confidence in answering the question, thereby mitigating the influence of redundant information. Our framework enables the acquisition of accurate and critical knowledge, and extensive experiments demonstrate that it outperforms state-of-the-art methods.
Related papers
- MaS-VQA: A Mask-and-Select Framework for Knowledge-Based Visual Question Answering [23.14938510901464]
Knowledge-based Visual Question Answering (KB-VQA) requires models to answer questions by integrating visual information with external knowledge.<n>We propose MaS-VQA, a selection-driven framework that tightly couples explicit knowledge filtering with implicit knowledge reasoning.<n> Experiments on Encyclopedic-VQA and InfoSeek demonstrate consistent performance gains across multiple MLLM backbones.
arXiv Detail & Related papers (2026-02-17T04:45:05Z) - Knowledge Condensation and Reasoning for Knowledge-based VQA [20.808840633377343]
Recent studies retrieve the knowledge passages from external knowledge bases and then use them to answer questions.
We propose two synergistic models: Knowledge Condensation model and Knowledge Reasoning model.
Our method achieves state-of-the-art performance on knowledge-based VQA datasets.
arXiv Detail & Related papers (2024-03-15T06:06:06Z) - InfuserKI: Enhancing Large Language Models with Knowledge Graphs via Infuser-Guided Knowledge Integration [58.61492157691623]
Methods for integrating knowledge have been developed, which augment LLMs with domain-specific knowledge graphs through external modules.<n>Our research focuses on a novel problem: efficiently integrating unknown knowledge into LLMs without unnecessary overlap of known knowledge.<n>A risk of introducing new knowledge is the potential forgetting of existing knowledge.
arXiv Detail & Related papers (2024-02-18T03:36:26Z) - Fine-grained Stateful Knowledge Exploration: Effective and Efficient Graph Retrieval with Large Language Models [19.049828741139425]
Large Language Models (LLMs) have shown impressive capabilities, yet updating their knowledge remains a significant challenge.<n>Most existing methods use a paradigm that treats the whole question as the objective, with relevant knowledge being incrementally retrieved from the knowledge graph.<n>We propose FiSKE, a novel paradigm for Fine-grained Stateful Knowledge Exploration.
arXiv Detail & Related papers (2024-01-24T13:36:50Z) - Beyond Factuality: A Comprehensive Evaluation of Large Language Models
as Knowledge Generators [78.63553017938911]
Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks.
However, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge.
We introduce CONNER, designed to evaluate generated knowledge from six important perspectives.
arXiv Detail & Related papers (2023-10-11T08:22:37Z) - A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA [67.75989848202343]
This paper presents a unified end-to-end retriever-reader framework towards knowledge-based VQA.
We shed light on the multi-modal implicit knowledge from vision-language pre-training models to mine its potential in knowledge reasoning.
Our scheme is able to not only provide guidance for knowledge retrieval, but also drop these instances potentially error-prone towards question answering.
arXiv Detail & Related papers (2022-06-30T02:35:04Z) - Coarse-to-Careful: Seeking Semantic-related Knowledge for Open-domain
Commonsense Question Answering [12.406729445165857]
It is prevalent to utilize external knowledge to help machine answer questions that need background commonsense.
We propose a semantic-driven knowledge-aware QA framework, which controls the knowledge injection in a coarse-to-careful fashion.
arXiv Detail & Related papers (2021-07-04T10:56:36Z) - Contextualized Knowledge-aware Attentive Neural Network: Enhancing
Answer Selection with Knowledge [77.77684299758494]
We extensively investigate approaches to enhancing the answer selection model with external knowledge from knowledge graph (KG)
First, we present a context-knowledge interaction learning framework, Knowledge-aware Neural Network (KNN), which learns the QA sentence representations by considering a tight interaction with the external knowledge from KG and the textual information.
To handle the diversity and complexity of KG information, we propose a Contextualized Knowledge-aware Attentive Neural Network (CKANN), which improves the knowledge representation learning with structure information via a customized Graph Convolutional Network (GCN) and comprehensively learns context-based and knowledge-based sentence representation via
arXiv Detail & Related papers (2021-04-12T05:52:20Z) - Incremental Knowledge Based Question Answering [52.041815783025186]
We propose a new incremental KBQA learning framework that can progressively expand learning capacity as humans do.
Specifically, it comprises a margin-distilled loss and a collaborative selection method, to overcome the catastrophic forgetting problem.
The comprehensive experiments demonstrate its effectiveness and efficiency when working with the evolving knowledge base.
arXiv Detail & Related papers (2021-01-18T09:03:38Z) - KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time.
We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.