Open-Set Knowledge-Based Visual Question Answering with Inference Paths
- URL: http://arxiv.org/abs/2310.08148v1
- Date: Thu, 12 Oct 2023 09:12:50 GMT
- Title: Open-Set Knowledge-Based Visual Question Answering with Inference Paths
- Authors: Jingru Gan, Xinzhe Han, Shuhui Wang, Qingming Huang
- Abstract summary: The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
- Score: 79.55742631375063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given an image and an associated textual question, the purpose of
Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct
answer to the question with the aid of external knowledge bases. Prior KB-VQA
models are usually formulated as a retriever-classifier framework, where a
pre-trained retriever extracts textual or visual information from knowledge
graphs and then makes a prediction among the candidates. Despite promising
progress, there are two drawbacks with existing models. Firstly, modeling
question-answering as multi-class classification limits the answer space to a
preset corpus and lacks the ability of flexible reasoning. Secondly, the
classifier merely consider "what is the answer" without "how to get the
answer", which cannot ground the answer to explicit reasoning paths. In this
paper, we confront the challenge of \emph{explainable open-set} KB-VQA, where
the system is required to answer questions with entities at wild and retain an
explainable reasoning path. To resolve the aforementioned issues, we propose a
new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for
brevity). Specifically, it contains graph constructing, pruning, and path-level
ranking, which not only retrieves accurate answers but also provides inference
paths that explain the reasoning process. To comprehensively evaluate our
model, we reformulate the benchmark dataset OK-VQA with manually corrected
entity-level annotations and release it as ConceptVQA. Extensive experiments on
real-world questions demonstrate that our framework is not only able to perform
open-set question answering across the whole knowledge base but provide
explicit reasoning path.
Related papers
- Question-guided Knowledge Graph Re-scoring and Injection for Knowledge Graph Question Answering [27.414670144354453]
KGQA involves answering natural language questions by leveraging structured information stored in a knowledge graph.
We propose a Question-guided Knowledge Graph Re-scoring method (Q-KGR) to eliminate noisy pathways for the input question.
We also introduce Knowformer, a parameter-efficient method for injecting the re-scored knowledge graph into large language models to enhance their ability to perform factual reasoning.
arXiv Detail & Related papers (2024-10-02T10:27:07Z) - One Model, Any Conjunctive Query: Graph Neural Networks for Answering Complex Queries over Knowledge Graphs [7.34044245579928]
We propose AnyCQ, a graph neural network model that can classify answers to any conjunctive query on any knowledge graph.
We show that AnyCQ can generalize to large queries of arbitrary structure, reliably classifying and retrieving answers to samples where existing approaches fail.
arXiv Detail & Related papers (2024-09-21T00:30:44Z) - Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering [11.183845003492964]
We use Passage Retrieval (DPR) to retrieve related knowledge to help the model answer questions.
DPR conduct retrieving in natural language space, which may not ensure comprehensive acquisition of image information.
We propose a novel framework that leverages the visual-language model to select the key knowledge retrieved by DPR and answer questions.
arXiv Detail & Related papers (2024-04-22T07:44:20Z) - Reasoning over Hierarchical Question Decomposition Tree for Explainable
Question Answering [83.74210749046551]
We propose to leverage question decomposing for heterogeneous knowledge integration.
We propose a novel two-stage XQA framework, Reasoning over Hierarchical Question Decomposition Tree (RoHT)
Experiments on complex QA datasets KQA Pro and Musique show that our framework outperforms SOTA methods significantly.
arXiv Detail & Related papers (2023-05-24T11:45:59Z) - DecAF: Joint Decoding of Answers and Logical Forms for Question
Answering over Knowledge Bases [81.19499764899359]
We propose a novel framework DecAF that jointly generates both logical forms and direct answers.
DecAF achieves new state-of-the-art accuracy on WebQSP, FreebaseQA, and GrailQA benchmarks.
arXiv Detail & Related papers (2022-09-30T19:51:52Z) - A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge [39.788346536244504]
A-OKVQA is a crowdsourced dataset composed of about 25K questions.
We demonstrate the potential of this new dataset through a detailed analysis of its contents.
arXiv Detail & Related papers (2022-06-03T17:52:27Z) - KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time.
We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.