Retrieval Augmented Visual Question Answering with Outside Knowledge
- URL: http://arxiv.org/abs/2210.03809v1
- Date: Fri, 7 Oct 2022 20:35:58 GMT
- Title: Retrieval Augmented Visual Question Answering with Outside Knowledge
- Authors: Weizhe Lin, Bill Byrne
- Abstract summary: Outside-Knowledge Visual Question Answering (OK-VQA) is a challenging VQA task that requires retrieval of external knowledge to answer questions about images.
Recent OK-VQA systems use Dense Passage Retrieval (DPR) to retrieve documents from external knowledge bases, such as Wikipedia, but with DPR trained separately from answer generation.
We propose a joint training scheme which includes differentiable DPR integrated with answer generation so that the system can be trained in an end-to-end fashion.
- Score: 14.371342370460685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Outside-Knowledge Visual Question Answering (OK-VQA) is a challenging VQA
task that requires retrieval of external knowledge to answer questions about
images. Recent OK-VQA systems use Dense Passage Retrieval (DPR) to retrieve
documents from external knowledge bases, such as Wikipedia, but with DPR
trained separately from answer generation, introducing a potential limit on the
overall system performance. Instead, we propose a joint training scheme which
includes differentiable DPR integrated with answer generation so that the
system can be trained in an end-to-end fashion. Our experiments show that our
scheme outperforms recent OK-VQA systems with strong DPR for retrieval. We also
introduce new diagnostic metrics to analyze how retrieval and generation
interact. The strong retrieval ability of our model significantly reduces the
number of retrieved documents needed in training, yielding significant benefits
in answer quality and computation required for training.
Related papers
- Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation [72.70046559930555]
We propose a generic RAG approach called Adaptive Note-Enhanced RAG (Adaptive-Note) for complex QA tasks.
Specifically, Adaptive-Note introduces an overarching view of knowledge growth, iteratively gathering new information in the form of notes.
In addition, we employ an adaptive, note-based stop-exploration strategy to decide "what to retrieve and when to stop" to encourage sufficient knowledge exploration.
arXiv Detail & Related papers (2024-10-11T14:03:29Z) - Multimodal Reranking for Knowledge-Intensive Visual Question Answering [77.24401833951096]
We introduce a multi-modal reranker to improve the ranking quality of knowledge candidates for answer generation.
Experiments on OK-VQA and A-OKVQA show that multi-modal reranker from distant supervision provides consistent improvements.
arXiv Detail & Related papers (2024-07-17T02:58:52Z) - Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual
Question Answering [16.52970318866536]
This paper studies a category of visual question answering tasks, in which accessing external knowledge is necessary for answering the questions.
A major step in developing OK-VQA systems is to retrieve relevant documents for the given multi-modal query.
We propose an automatic data generation pipeline for pre-training passage retrieval models for OK-VQA tasks.
arXiv Detail & Related papers (2023-06-28T18:06:40Z) - PIE-QG: Paraphrased Information Extraction for Unsupervised Question
Generation from Small Corpora [4.721845865189576]
PIE-QG uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages.
Triples in the form of subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers.
arXiv Detail & Related papers (2023-01-03T12:20:51Z) - Retrieval as Attention: End-to-end Learning of Retrieval and Reading
within a Single Transformer [80.50327229467993]
We show that a single model trained end-to-end can achieve both competitive retrieval and QA performance.
We show that end-to-end adaptation significantly boosts its performance on out-of-domain datasets in both supervised and unsupervised settings.
arXiv Detail & Related papers (2022-12-05T04:51:21Z) - Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual
Question Answering [27.38981906033932]
Outside-Knowledge Visual Question Answering (OK-VQA) systems employ a two-stage framework that first retrieves external knowledge and then predicts the answer.
Retrievals are frequently too general and fail to cover specific knowledge needed to answer the question.
We propose an Entity-Focused Retrieval (EnFoRe) model that provides stronger supervision during training and recognizes question-relevant entities to help retrieve more specific knowledge.
arXiv Detail & Related papers (2022-10-18T21:39:24Z) - Questions Are All You Need to Train a Dense Passage Retriever [123.13872383489172]
ART is a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.
It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question.
arXiv Detail & Related papers (2022-06-21T18:16:31Z) - Achieving Human Parity on Visual Question Answering [67.22500027651509]
The Visual Question Answering (VQA) task utilizes both visual image and language analysis to answer a textual question with respect to an image.
This paper describes our recent research of AliceMind-MMU that obtains similar or even slightly better results than human beings does on VQA.
This is achieved by systematically improving the VQA pipeline including: (1) pre-training with comprehensive visual and textual feature representation; (2) effective cross-modal interaction with learning to attend; and (3) A novel knowledge mining framework with specialized expert modules for the complex VQA task.
arXiv Detail & Related papers (2021-11-17T04:25:11Z) - Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex
Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam.
These questions are the most challenging for current QA systems.
We present a Multi-step reasoning with Knowledge extraction framework (MurKe)
We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.