Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding
- URL: http://arxiv.org/abs/2012.07192v1
- Date: Mon, 14 Dec 2020 00:33:44 GMT
- Title: Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding
- Authors: Qingxing Cao and Bailin Li and Xiaodan Liang and Keze Wang and Liang
Lin
- Abstract summary: We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
- Score: 140.5911760063681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though beneficial for encouraging the Visual Question Answering (VQA) models
to discover the underlying knowledge by exploiting the input-output correlation
beyond image and text contexts, the existing knowledge VQA datasets are mostly
annotated in a crowdsource way, e.g., collecting questions and external reasons
from different users via the internet. In addition to the challenge of
knowledge reasoning, how to deal with the annotator bias also remains unsolved,
which often leads to superficial over-fitted correlations between questions and
answers. To address this issue, we propose a novel dataset named
Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
Considering that a desirable VQA model should correctly perceive the image
context, understand the question, and incorporate its learned knowledge, our
proposed dataset aims to cutoff the shortcut learning exploited by the current
deep embedding models and push the research boundary of the knowledge-based
visual question reasoning. Specifically, we generate the question-answer pair
based on both the Visual Genome scene graph and an external knowledge base with
controlled programs to disentangle the knowledge from other biases. The
programs can select one or two triplets from the scene graph or knowledge base
to push multi-step reasoning, avoid answer ambiguity, and balanced the answer
distribution. In contrast to the existing VQA datasets, we further imply the
following two major constraints on the programs to incorporate knowledge
reasoning: i) multiple knowledge triplets can be related to the question, but
only one knowledge relates to the image object. This can enforce the VQA model
to correctly perceive the image instead of guessing the knowledge based on the
given question solely; ii) all questions are based on different knowledge, but
the candidate answers are the same for both the training and test sets.
Related papers
- Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference.
We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge [39.788346536244504]
A-OKVQA is a crowdsourced dataset composed of about 25K questions.
We demonstrate the potential of this new dataset through a detailed analysis of its contents.
arXiv Detail & Related papers (2022-06-03T17:52:27Z) - K-VQG: Knowledge-aware Visual Question Generation for Common-sense
Acquisition [64.55573343404572]
We present a novel knowledge-aware VQG dataset called K-VQG.
This is the first large, humanly annotated dataset in which questions regarding images are tied to structured knowledge.
We also develop a new VQG model that can encode and use knowledge as the target for a question.
arXiv Detail & Related papers (2022-03-15T13:38:10Z) - KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time.
We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z) - Cross-modal Knowledge Reasoning for Knowledge-based Visual Question
Answering [27.042604046441426]
Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image.
In this paper, we depict an image by multiple knowledge graphs from the visual, semantic and factual views.
We decompose the model into a series of memory-based reasoning steps, each performed by a G raph-based R ead, U pdate, and C ontrol.
We achieve a new state-of-the-art performance on three popular benchmark datasets, including FVQA, Visual7W-KB and OK-VQA.
arXiv Detail & Related papers (2020-08-31T23:25:01Z) - Understanding Knowledge Gaps in Visual Question Answering: Implications
for Gap Identification and Testing [20.117014315684287]
We use a taxonomy of Knowledge Gaps (KGs) to tag questions with one or more types of KGs.
We then examine the skew in the distribution of questions for each KG.
These new questions can be added to existing VQA datasets to increase the diversity of questions and reduce the skew.
arXiv Detail & Related papers (2020-04-08T00:27:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.