Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual
Question Answering
- URL: http://arxiv.org/abs/2210.10176v1
- Date: Tue, 18 Oct 2022 21:39:24 GMT
- Title: Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual
Question Answering
- Authors: Jialin Wu and Raymond J. Mooney
- Abstract summary: Outside-Knowledge Visual Question Answering (OK-VQA) systems employ a two-stage framework that first retrieves external knowledge and then predicts the answer.
Retrievals are frequently too general and fail to cover specific knowledge needed to answer the question.
We propose an Entity-Focused Retrieval (EnFoRe) model that provides stronger supervision during training and recognizes question-relevant entities to help retrieve more specific knowledge.
- Score: 27.38981906033932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most Outside-Knowledge Visual Question Answering (OK-VQA) systems employ a
two-stage framework that first retrieves external knowledge given the visual
question and then predicts the answer based on the retrieved content. However,
the retrieved knowledge is often inadequate. Retrievals are frequently too
general and fail to cover specific knowledge needed to answer the question.
Also, the naturally available supervision (whether the passage contains the
correct answer) is weak and does not guarantee question relevancy. To address
these issues, we propose an Entity-Focused Retrieval (EnFoRe) model that
provides stronger supervision during training and recognizes question-relevant
entities to help retrieve more specific knowledge. Experiments show that our
EnFoRe model achieves superior retrieval performance on OK-VQA, the currently
largest outside-knowledge VQA dataset. We also combine the retrieved knowledge
with state-of-the-art VQA models, and achieve a new state-of-the-art
performance on OK-VQA.
Related papers
- Multimodal Reranking for Knowledge-Intensive Visual Question Answering [77.24401833951096]
We introduce a multi-modal reranker to improve the ranking quality of knowledge candidates for answer generation.
Experiments on OK-VQA and A-OKVQA show that multi-modal reranker from distant supervision provides consistent improvements.
arXiv Detail & Related papers (2024-07-17T02:58:52Z) - Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering [11.183845003492964]
We use Passage Retrieval (DPR) to retrieve related knowledge to help the model answer questions.
DPR conduct retrieving in natural language space, which may not ensure comprehensive acquisition of image information.
We propose a novel framework that leverages the visual-language model to select the key knowledge retrieved by DPR and answer questions.
arXiv Detail & Related papers (2024-04-22T07:44:20Z) - Knowledge Condensation and Reasoning for Knowledge-based VQA [20.808840633377343]
Recent studies retrieve the knowledge passages from external knowledge bases and then use them to answer questions.
We propose two synergistic models: Knowledge Condensation model and Knowledge Reasoning model.
Our method achieves state-of-the-art performance on knowledge-based VQA datasets.
arXiv Detail & Related papers (2024-03-15T06:06:06Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - Retrieval Augmented Visual Question Answering with Outside Knowledge [14.371342370460685]
Outside-Knowledge Visual Question Answering (OK-VQA) is a challenging VQA task that requires retrieval of external knowledge to answer questions about images.
Recent OK-VQA systems use Dense Passage Retrieval (DPR) to retrieve documents from external knowledge bases, such as Wikipedia, but with DPR trained separately from answer generation.
We propose a joint training scheme which includes differentiable DPR integrated with answer generation so that the system can be trained in an end-to-end fashion.
arXiv Detail & Related papers (2022-10-07T20:35:58Z) - A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA [67.75989848202343]
This paper presents a unified end-to-end retriever-reader framework towards knowledge-based VQA.
We shed light on the multi-modal implicit knowledge from vision-language pre-training models to mine its potential in knowledge reasoning.
Our scheme is able to not only provide guidance for knowledge retrieval, but also drop these instances potentially error-prone towards question answering.
arXiv Detail & Related papers (2022-06-30T02:35:04Z) - REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual
Question Answering [75.53187719777812]
This paper revisits visual representation in knowledge-based visual question answering (VQA)
We propose a new knowledge-based VQA method REVIVE, which tries to utilize the explicit information of object regions.
We achieve new state-of-the-art performance, i.e., 58.0% accuracy, surpassing previous state-of-the-art method by a large margin.
arXiv Detail & Related papers (2022-06-02T17:59:56Z) - KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time.
We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.