Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question
Answering
- URL: http://arxiv.org/abs/2109.04014v1
- Date: Thu, 9 Sep 2021 03:21:32 GMT
- Title: Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question
Answering
- Authors: Man Luo, Yankai Zeng, Pratyay Banerjee, Chitta Baral
- Abstract summary: Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images.
One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold standard knowledge corpus for retrieval.
We propose a Visual Retriever-Reader pipeline to approach knowledge-based VQA.
- Score: 16.96751206502189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge-based visual question answering (VQA) requires answering questions
with external knowledge in addition to the content of images. One dataset that
is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold
standard knowledge corpus for retrieval. Existing work leverage different
knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge.
Because of varying knowledge bases, it is hard to fairly compare models'
performance. To address this issue, we collect a natural language knowledge
base that can be used for any VQA system. Moreover, we propose a Visual
Retriever-Reader pipeline to approach knowledge-based VQA. The visual retriever
aims to retrieve relevant knowledge, and the visual reader seeks to predict
answers based on given knowledge. We introduce various ways to retrieve
knowledge using text and images and two reader styles: classification and
extraction. Both the retriever and reader are trained with weak supervision.
Our experimental results show that a good retriever can significantly improve
the reader's performance on the OK-VQA challenge. The code and corpus are
provided in https://github.com/luomancs/retriever\_reader\_for\_okvqa.git
Related papers
- Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering [11.183845003492964]
We use Passage Retrieval (DPR) to retrieve related knowledge to help the model answer questions.
DPR conduct retrieving in natural language space, which may not ensure comprehensive acquisition of image information.
We propose a novel framework that leverages the visual-language model to select the key knowledge retrieved by DPR and answer questions.
arXiv Detail & Related papers (2024-04-22T07:44:20Z) - A Simple Baseline for Knowledge-Based Visual Question Answering [78.00758742784532]
This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA)
Our main contribution in this paper is to propose a much simpler and readily reproducible pipeline.
Contrary to recent approaches, our method is training-free, does not require access to external databases or APIs, and achieves state-of-the-art accuracy on the OK-VQA and A-OK-VQA datasets.
arXiv Detail & Related papers (2023-10-20T15:08:17Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - VLC-BERT: Visual Question Answering with Contextualized Commonsense
Knowledge [48.457788853408616]
We propose a method to generate, select, and encode external commonsense knowledge alongside visual and textual cues.
We show that VLC-BERT is capable of outperforming existing models that utilize static knowledge bases.
arXiv Detail & Related papers (2022-10-24T22:01:17Z) - Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual
Question Answering [27.38981906033932]
Outside-Knowledge Visual Question Answering (OK-VQA) systems employ a two-stage framework that first retrieves external knowledge and then predicts the answer.
Retrievals are frequently too general and fail to cover specific knowledge needed to answer the question.
We propose an Entity-Focused Retrieval (EnFoRe) model that provides stronger supervision during training and recognizes question-relevant entities to help retrieve more specific knowledge.
arXiv Detail & Related papers (2022-10-18T21:39:24Z) - LaKo: Knowledge-driven Visual Question Answering via Late
Knowledge-to-Text Injection [30.65373229617201]
We propose LaKo, a knowledge-driven VQA method via Late Knowledge-to-text Injection.
To effectively incorporate an external KG, we transfer triples into text and propose a late injection mechanism.
In the evaluation with OKVQA datasets, our method achieves state-of-the-art results.
arXiv Detail & Related papers (2022-07-26T13:29:51Z) - A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA [67.75989848202343]
This paper presents a unified end-to-end retriever-reader framework towards knowledge-based VQA.
We shed light on the multi-modal implicit knowledge from vision-language pre-training models to mine its potential in knowledge reasoning.
Our scheme is able to not only provide guidance for knowledge retrieval, but also drop these instances potentially error-prone towards question answering.
arXiv Detail & Related papers (2022-06-30T02:35:04Z) - Multi-Modal Answer Validation for Knowledge-Based VQA [44.80209704315099]
We propose Multi-modal Answer Validation using External knowledge (MAVEx)
The idea is to validate a set of promising answer candidates based on answer-specific knowledge retrieval.
Our experiments with OK-VQA, a challenging knowledge-based VQA dataset, demonstrate that MAVEx achieves new state-of-the-art results.
arXiv Detail & Related papers (2021-03-23T00:49:36Z) - KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time.
We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.