Multi-Modal Answer Validation for Knowledge-Based VQA
- URL: http://arxiv.org/abs/2103.12248v1
- Date: Tue, 23 Mar 2021 00:49:36 GMT
- Title: Multi-Modal Answer Validation for Knowledge-Based VQA
- Authors: Jialin Wu, Jiasen Lu, Ashish Sabharwal, Roozbeh Mottaghi
- Abstract summary: We propose Multi-modal Answer Validation using External knowledge (MAVEx)
The idea is to validate a set of promising answer candidates based on answer-specific knowledge retrieval.
Our experiments with OK-VQA, a challenging knowledge-based VQA dataset, demonstrate that MAVEx achieves new state-of-the-art results.
- Score: 44.80209704315099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of knowledge-based visual question answering involves answering
questions that require external knowledge in addition to the content of the
image. Such knowledge typically comes in a variety of forms, including visual,
textual, and commonsense knowledge. The use of more knowledge sources, however,
also increases the chance of retrieving more irrelevant or noisy facts, making
it difficult to comprehend the facts and find the answer. To address this
challenge, we propose Multi-modal Answer Validation using External knowledge
(MAVEx), where the idea is to validate a set of promising answer candidates
based on answer-specific knowledge retrieval. This is in contrast to existing
approaches that search for the answer in a vast collection of often irrelevant
facts. Our approach aims to learn which knowledge source should be trusted for
each answer candidate and how to validate the candidate using that source. We
consider a multi-modal setting, relying on both textual and visual knowledge
resources, including images searched using Google, sentences from Wikipedia
articles, and concepts from ConceptNet. Our experiments with OK-VQA, a
challenging knowledge-based VQA dataset, demonstrate that MAVEx achieves new
state-of-the-art results.
Related papers
- Knowledge Condensation and Reasoning for Knowledge-based VQA [20.808840633377343]
Recent studies retrieve the knowledge passages from external knowledge bases and then use them to answer questions.
We propose two synergistic models: Knowledge Condensation model and Knowledge Reasoning model.
Our method achieves state-of-the-art performance on knowledge-based VQA datasets.
arXiv Detail & Related papers (2024-03-15T06:06:06Z) - Knowledge Detection by Relevant Question and Image Attributes in Visual
Question Answering [0.0]
Visual question answering (VQA) is a multidisciplinary research problem that pursued through practices of natural language processing and computer vision.
Our proposed method takes image attributes and question features as input for knowledge derivation module and retrieves only question relevant knowledge about image objects which can provide accurate answers.
arXiv Detail & Related papers (2023-06-08T05:08:32Z) - VLC-BERT: Visual Question Answering with Contextualized Commonsense
Knowledge [48.457788853408616]
We propose a method to generate, select, and encode external commonsense knowledge alongside visual and textual cues.
We show that VLC-BERT is capable of outperforming existing models that utilize static knowledge bases.
arXiv Detail & Related papers (2022-10-24T22:01:17Z) - A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA [67.75989848202343]
This paper presents a unified end-to-end retriever-reader framework towards knowledge-based VQA.
We shed light on the multi-modal implicit knowledge from vision-language pre-training models to mine its potential in knowledge reasoning.
Our scheme is able to not only provide guidance for knowledge retrieval, but also drop these instances potentially error-prone towards question answering.
arXiv Detail & Related papers (2022-06-30T02:35:04Z) - Contextualized Knowledge-aware Attentive Neural Network: Enhancing
Answer Selection with Knowledge [77.77684299758494]
We extensively investigate approaches to enhancing the answer selection model with external knowledge from knowledge graph (KG)
First, we present a context-knowledge interaction learning framework, Knowledge-aware Neural Network (KNN), which learns the QA sentence representations by considering a tight interaction with the external knowledge from KG and the textual information.
To handle the diversity and complexity of KG information, we propose a Contextualized Knowledge-aware Attentive Neural Network (CKANN), which improves the knowledge representation learning with structure information via a customized Graph Convolutional Network (GCN) and comprehensively learns context-based and knowledge-based sentence representation via
arXiv Detail & Related papers (2021-04-12T05:52:20Z) - KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time.
We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.