Can Open Domain Question Answering Systems Answer Visual Knowledge
Questions?
- URL: http://arxiv.org/abs/2202.04306v1
- Date: Wed, 9 Feb 2022 06:47:40 GMT
- Title: Can Open Domain Question Answering Systems Answer Visual Knowledge
Questions?
- Authors: Jiawen Zhang, Abhijit Mishra, Avinesh P.V.S, Siddharth Patwardhan and
Sachin Agarwal
- Abstract summary: We observe that many visual questions, which contain deictic referential phrases referring to entities in the image, can be rewritten as "non-grounded" questions.
This allows for the reuse of existing text-based Open Domain Question Answering (QA) Systems for visual question answering.
We propose a potentially data-efficient approach that reuses existing systems for (a) image analysis, (b) question rewriting, and (c) text-based question answering to answer such visual questions.
- Score: 7.442099405543527
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of Outside Knowledge Visual Question Answering (OKVQA) requires an
automatic system to answer natural language questions about pictures and images
using external knowledge. We observe that many visual questions, which contain
deictic referential phrases referring to entities in the image, can be
rewritten as "non-grounded" questions and can be answered by existing
text-based question answering systems. This allows for the reuse of existing
text-based Open Domain Question Answering (QA) Systems for visual question
answering. In this work, we propose a potentially data-efficient approach that
reuses existing systems for (a) image analysis, (b) question rewriting, and (c)
text-based question answering to answer such visual questions. Given an image
and a question pertaining to that image (a visual question), we first extract
the entities present in the image using pre-trained object and scene
classifiers. Using these detected entities, the visual questions can be
rewritten so as to be answerable by open domain QA systems. We explore two
rewriting strategies: (1) an unsupervised method using BERT for masking and
rewriting, and (2) a weakly supervised approach that combines adaptive
rewriting and reinforcement learning techniques to use the implicit feedback
from the QA system. We test our strategies on the publicly available OKVQA
dataset and obtain a competitive performance with state-of-the-art models while
using only 10% of the training data.
Related papers
- Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference.
We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z) - CommVQA: Situating Visual Question Answering in Communicative Contexts [16.180130883242672]
We introduce CommVQA, a dataset consisting of images, image descriptions, real-world communicative scenarios where the image might appear.
We show that access to contextual information is essential for solving CommVQA, leading to the highest performing VQA model.
arXiv Detail & Related papers (2024-02-22T22:31:39Z) - Language Guided Visual Question Answering: Elevate Your Multimodal
Language Model Using Knowledge-Enriched Prompts [54.072432123447854]
Visual question answering (VQA) is the task of answering questions about an image.
Answering the question requires commonsense knowledge, world knowledge, and reasoning about ideas and concepts not present in the image.
We propose a framework that uses language guidance (LG) in the form of rationales, image captions, scene graphs, etc to answer questions more accurately.
arXiv Detail & Related papers (2023-10-31T03:54:11Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - Text-Aware Dual Routing Network for Visual Question Answering [11.015339851906287]
Existing approaches often fail in cases that require reading and understanding text in images to answer questions.
We propose a Text-Aware Dual Routing Network (TDR) which simultaneously handles the VQA cases with and without understanding text information in the input images.
In the branch that involves text understanding, we incorporate the Optical Character Recognition (OCR) features into the model to help understand the text in the images.
arXiv Detail & Related papers (2022-11-17T02:02:11Z) - Multifaceted Improvements for Conversational Open-Domain Question
Answering [54.913313912927045]
We propose a framework with Multifaceted Improvements for Conversational open-domain Question Answering (MICQA)
Firstly, the proposed KL-divergence based regularization is able to lead to a better question understanding for retrieval and answer reading.
Second, the added post-ranker module can push more relevant passages to the top placements and be selected for reader with a two-aspect constrains.
Third, the well designed curriculum learning strategy effectively narrows the gap between the golden passage settings of training and inference, and encourages the reader to find true answer without the golden passage assistance.
arXiv Detail & Related papers (2022-04-01T07:54:27Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z) - CapWAP: Captioning with a Purpose [56.99405135645775]
We propose a new task, Captioning with a Purpose (CapWAP)
Our goal is to develop systems that can be tailored to be useful for the information needs of an intended population.
We show that it is possible to use reinforcement learning to directly optimize for the intended information need.
arXiv Detail & Related papers (2020-11-09T09:23:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.