Zero-shot Visual Question Answering using Knowledge Graph
- URL: http://arxiv.org/abs/2107.05348v3
- Date: Wed, 14 Jul 2021 11:37:13 GMT
- Title: Zero-shot Visual Question Answering using Knowledge Graph
- Authors: Zhuo Chen, Jiaoyan Chen, Yuxia Geng, Jeff Z. Pan, Zonggang Yuan and
Huajun Chen
- Abstract summary: We propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge.
Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers.
- Score: 19.142028501513366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incorporating external knowledge to Visual Question Answering (VQA) has
become a vital practical need. Existing methods mostly adopt pipeline
approaches with different components for knowledge matching and extraction,
feature learning, etc.However, such pipeline approaches suffer when some
component does not perform well, which leads to error propagation and poor
overall performance. Furthermore, the majority of existing approaches ignore
the answer bias issue -- many answers may have never appeared during training
(i.e., unseen answers) in real-word application. To bridge these gaps, in this
paper, we propose a Zero-shot VQA algorithm using knowledge graphs and a
mask-based learning mechanism for better incorporating external knowledge, and
present new answer-based Zero-shot VQA splits for the F-VQA dataset.
Experiments show that our method can achieve state-of-the-art performance in
Zero-shot VQA with unseen answers, meanwhile dramatically augment existing
end-to-end models on the normal F-VQA task.
Related papers
- Improving Zero-shot Visual Question Answering via Large Language Models
with Reasoning Question Prompts [22.669502403623166]
We present Reasoning Question Prompts for VQA tasks, which can further activate the potential of Large Language Models.
We generate self-contained questions as reasoning question prompts via an unsupervised question edition module.
Each reasoning question prompt clearly indicates the intent of the original question.
Then, the candidate answers associated with their confidence scores acting as answer integritys are fed into LLMs.
arXiv Detail & Related papers (2023-11-15T15:40:46Z) - Exploring Question Decomposition for Zero-Shot VQA [99.32466439254821]
We investigate a question decomposition strategy for visual question answering.
We show that naive application of model-written decompositions can hurt performance.
We introduce a model-driven selective decomposition approach for second-guessing predictions and correcting errors.
arXiv Detail & Related papers (2023-10-25T23:23:57Z) - A Simple Baseline for Knowledge-Based Visual Question Answering [78.00758742784532]
This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA)
Our main contribution in this paper is to propose a much simpler and readily reproducible pipeline.
Contrary to recent approaches, our method is training-free, does not require access to external databases or APIs, and achieves state-of-the-art accuracy on the OK-VQA and A-OK-VQA datasets.
arXiv Detail & Related papers (2023-10-20T15:08:17Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA [67.75989848202343]
This paper presents a unified end-to-end retriever-reader framework towards knowledge-based VQA.
We shed light on the multi-modal implicit knowledge from vision-language pre-training models to mine its potential in knowledge reasoning.
Our scheme is able to not only provide guidance for knowledge retrieval, but also drop these instances potentially error-prone towards question answering.
arXiv Detail & Related papers (2022-06-30T02:35:04Z) - Coarse-to-Fine Reasoning for Visual Question Answering [18.535633096397397]
We present a new reasoning framework to fill the gap between visual features and semantic clues in the Visual Question Answering (VQA) task.
Our method first extracts the features and predicates from the image and question.
We then propose a new reasoning framework to effectively jointly learn these features and predicates in a coarse-to-fine manner.
arXiv Detail & Related papers (2021-10-06T06:29:52Z) - Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in
Visual Question Answering [42.120558318437475]
Shortcut learning happens when a model exploits spurious statistical regularities to produce correct answers but does not deploy the desired behavior.
We introduce an evaluation methodology for visual question answering (VQA) to better diagnose cases of shortcut learning.
arXiv Detail & Related papers (2021-04-07T14:28:22Z) - Learning Compositional Representation for Few-shot Visual Question
Answering [93.4061107793983]
Current methods of Visual Question Answering perform well on the answers with an amount of training data but have limited accuracy on the novel ones with few examples.
We propose to extract the attributes from the answers with enough data, which are later composed to constrain the learning of the few-shot ones.
Experimental results on the VQA v2.0 validation dataset demonstrate the effectiveness of our proposed attribute network.
arXiv Detail & Related papers (2021-02-21T10:16:24Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.