KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA
- URL: http://arxiv.org/abs/2012.11014v1
- Date: Sun, 20 Dec 2020 20:13:02 GMT
- Title: KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA
- Authors: Kenneth Marino, Xinlei Chen, Devi Parikh, Abhinav Gupta, Marcus
Rohrbach
- Abstract summary: One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time.
We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
- Score: 107.7091094498848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the most challenging question types in VQA is when answering the
question requires outside knowledge not present in the image. In this work we
study open-domain knowledge, the setting when the knowledge required to answer
a question is not given/annotated, neither at training nor test time. We tap
into two types of knowledge representations and reasoning. First, implicit
knowledge which can be learned effectively from unsupervised language
pre-training and supervised training data with transformer-based models.
Second, explicit, symbolic knowledge encoded in knowledge bases. Our approach
combines both - exploiting the powerful implicit reasoning of transformer
models for answer prediction, and integrating symbolic representations from a
knowledge graph, while never losing their explicit semantics to an implicit
embedding. We combine diverse sources of knowledge to cover the wide variety of
knowledge needed to solve knowledge-based questions. We show our approach,
KRISP (Knowledge Reasoning with Implicit and Symbolic rePresentations),
significantly outperforms state-of-the-art on OK-VQA, the largest available
dataset for open-domain knowledge-based VQA. We show with extensive ablations
that while our model successfully exploits implicit knowledge reasoning, the
symbolic answer module which explicitly connects the knowledge graph to the
answer vocabulary is critical to the performance of our method and generalizes
to rare answers.
Related papers
- Knowledge Condensation and Reasoning for Knowledge-based VQA [20.808840633377343]
Recent studies retrieve the knowledge passages from external knowledge bases and then use them to answer questions.
We propose two synergistic models: Knowledge Condensation model and Knowledge Reasoning model.
Our method achieves state-of-the-art performance on knowledge-based VQA datasets.
arXiv Detail & Related papers (2024-03-15T06:06:06Z) - Prophet: Prompting Large Language Models with Complementary Answer
Heuristics for Knowledge-based Visual Question Answering [30.858737348472626]
Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question.
Recent works have resorted to using a powerful large language model (LLM) as an implicit knowledge engine to acquire the necessary knowledge for answering.
We present a conceptually simple, flexible, and general framework designed to prompt LLM with answers for knowledge-based VQA.
arXiv Detail & Related papers (2023-03-03T13:05:15Z) - VLC-BERT: Visual Question Answering with Contextualized Commonsense
Knowledge [48.457788853408616]
We propose a method to generate, select, and encode external commonsense knowledge alongside visual and textual cues.
We show that VLC-BERT is capable of outperforming existing models that utilize static knowledge bases.
arXiv Detail & Related papers (2022-10-24T22:01:17Z) - A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA [67.75989848202343]
This paper presents a unified end-to-end retriever-reader framework towards knowledge-based VQA.
We shed light on the multi-modal implicit knowledge from vision-language pre-training models to mine its potential in knowledge reasoning.
Our scheme is able to not only provide guidance for knowledge retrieval, but also drop these instances potentially error-prone towards question answering.
arXiv Detail & Related papers (2022-06-30T02:35:04Z) - KAT: A Knowledge Augmented Transformer for Vision-and-Language [56.716531169609915]
We propose a novel model - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result on the open-domain multimodal task of OK-VQA.
Our approach integrates implicit and explicit knowledge in an end to end encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation.
An additional benefit of explicit knowledge integration is seen in improved interpretability of model predictions in our analysis.
arXiv Detail & Related papers (2021-12-16T04:37:10Z) - Coarse-to-Careful: Seeking Semantic-related Knowledge for Open-domain
Commonsense Question Answering [12.406729445165857]
It is prevalent to utilize external knowledge to help machine answer questions that need background commonsense.
We propose a semantic-driven knowledge-aware QA framework, which controls the knowledge injection in a coarse-to-careful fashion.
arXiv Detail & Related papers (2021-07-04T10:56:36Z) - Contextualized Knowledge-aware Attentive Neural Network: Enhancing
Answer Selection with Knowledge [77.77684299758494]
We extensively investigate approaches to enhancing the answer selection model with external knowledge from knowledge graph (KG)
First, we present a context-knowledge interaction learning framework, Knowledge-aware Neural Network (KNN), which learns the QA sentence representations by considering a tight interaction with the external knowledge from KG and the textual information.
To handle the diversity and complexity of KG information, we propose a Contextualized Knowledge-aware Attentive Neural Network (CKANN), which improves the knowledge representation learning with structure information via a customized Graph Convolutional Network (GCN) and comprehensively learns context-based and knowledge-based sentence representation via
arXiv Detail & Related papers (2021-04-12T05:52:20Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.