A Dataset and Baselines for Visual Question Answering on Art
- URL: http://arxiv.org/abs/2008.12520v1
- Date: Fri, 28 Aug 2020 07:33:30 GMT
- Title: A Dataset and Baselines for Visual Question Answering on Art
- Authors: Noa Garcia, Chentao Ye, Zihua Liu, Qingtao Hu, Mayu Otani, Chenhui
Chu, Yuta Nakashima, Teruko Mitamura
- Abstract summary: We introduce our first attempt towards building a new dataset, coined AQUA (Art QUestion Answering)
The question-answer (QA) pairs are automatically generated using state-of-the-art question generation methods.
Our dataset inherently consists of visual (painting-based) and knowledge (comment-based) questions.
- Score: 33.14114180168856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Answering questions related to art pieces (paintings) is a difficult task, as
it implies the understanding of not only the visual information that is shown
in the picture, but also the contextual knowledge that is acquired through the
study of the history of art. In this work, we introduce our first attempt
towards building a new dataset, coined AQUA (Art QUestion Answering). The
question-answer (QA) pairs are automatically generated using state-of-the-art
question generation methods based on paintings and comments provided in an
existing art understanding dataset. The QA pairs are cleansed by crowdsourcing
workers with respect to their grammatical correctness, answerability, and
answers' correctness. Our dataset inherently consists of visual
(painting-based) and knowledge (comment-based) questions. We also present a
two-branch model as baseline, where the visual and knowledge questions are
handled independently. We extensively compare our baseline model against the
state-of-the-art models for question answering, and we provide a comprehensive
study about the challenges and potential future directions for visual question
answering on art.
Related papers
- A Comprehensive Survey on Visual Question Answering Datasets and Algorithms [1.941892373913038]
We meticulously analyze the current state of VQA datasets and models, while cleanly dividing them into distinct categories and then summarizing the methodologies and characteristics of each category.
We explore six main paradigms of VQA models: fusion, attention, the technique of using information from one modality to filter information from another, external knowledge base, composition or reasoning, and graph models.
arXiv Detail & Related papers (2024-11-17T18:52:06Z) - Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference.
We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - Can Pre-trained Vision and Language Models Answer Visual
Information-Seeking Questions? [50.29862466940209]
We introduce InfoSeek, a visual question answering dataset tailored for information-seeking questions.
We analyze various pre-trained visual question answering models and gain insights into their characteristics.
We show that accurate visual entity recognition can be used to improve performance on InfoSeek by retrieving relevant documents.
arXiv Detail & Related papers (2023-02-23T00:33:54Z) - A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge [39.788346536244504]
A-OKVQA is a crowdsourced dataset composed of about 25K questions.
We demonstrate the potential of this new dataset through a detailed analysis of its contents.
arXiv Detail & Related papers (2022-06-03T17:52:27Z) - KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time.
We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.