K-VQG: Knowledge-aware Visual Question Generation for Common-sense
Acquisition
- URL: http://arxiv.org/abs/2203.07890v1
- Date: Tue, 15 Mar 2022 13:38:10 GMT
- Title: K-VQG: Knowledge-aware Visual Question Generation for Common-sense
Acquisition
- Authors: Kohei Uehara, Tatsuya Harada
- Abstract summary: We present a novel knowledge-aware VQG dataset called K-VQG.
This is the first large, humanly annotated dataset in which questions regarding images are tied to structured knowledge.
We also develop a new VQG model that can encode and use knowledge as the target for a question.
- Score: 64.55573343404572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual Question Generation (VQG) is a task to generate questions from images.
When humans ask questions about an image, their goal is often to acquire some
new knowledge. However, existing studies on VQG have mainly addressed question
generation from answers or question categories, overlooking the objectives of
knowledge acquisition. To introduce a knowledge acquisition perspective into
VQG, we constructed a novel knowledge-aware VQG dataset called K-VQG. This is
the first large, humanly annotated dataset in which questions regarding images
are tied to structured knowledge. We also developed a new VQG model that can
encode and use knowledge as the target for a question. The experiment results
show that our model outperforms existing models on the K-VQG dataset.
Related papers
- ConVQG: Contrastive Visual Question Generation with Multimodal Guidance [20.009626292937995]
We propose Contrastive Visual Question Generation (ConVQG) to generate image-grounded, text-guided, and knowledge-rich questions.
Experiments on knowledge-aware and standard VQG benchmarks demonstrate that ConVQG outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-02-20T09:20:30Z) - Language Guided Visual Question Answering: Elevate Your Multimodal
Language Model Using Knowledge-Enriched Prompts [54.072432123447854]
Visual question answering (VQA) is the task of answering questions about an image.
Answering the question requires commonsense knowledge, world knowledge, and reasoning about ideas and concepts not present in the image.
We propose a framework that uses language guidance (LG) in the form of rationales, image captions, scene graphs, etc to answer questions more accurately.
arXiv Detail & Related papers (2023-10-31T03:54:11Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets [5.45761450227064]
We propose a new Few-Shot Visual Question Generation (FS-VQG) task and provide a comprehensive benchmark to it.
We evaluate various existing VQG approaches as well as popular few-shot solutions based on meta-learning and self-supervised strategies for the FS-VQG task.
Several important findings emerge from our experiments, that shed light on the limits of current models in few-shot vision and language generation tasks.
arXiv Detail & Related papers (2022-10-13T15:01:15Z) - Discovering the Unknown Knowns: Turning Implicit Knowledge in the
Dataset into Explicit Training Examples for Visual Question Answering [18.33311267792116]
We find that many of the "unknowns" to the learned VQA model are indeed "known" in the dataset implicitly.
We present a simple data augmentation pipeline SimpleAug to turn this "known" knowledge into training examples for VQA.
arXiv Detail & Related papers (2021-09-13T16:56:43Z) - An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA [51.639880603821446]
We propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions for knowledge-based VQA.
We first convert the image into captions (or tags) that GPT-3 can understand, then adapt GPT-3 to solve the VQA task in a few-shot manner.
By using only 16 examples, PICa surpasses the supervised state of the art by an absolute +8.6 points on the OK-VQA dataset.
arXiv Detail & Related papers (2021-09-10T17:51:06Z) - KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time.
We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z) - Understanding Knowledge Gaps in Visual Question Answering: Implications
for Gap Identification and Testing [20.117014315684287]
We use a taxonomy of Knowledge Gaps (KGs) to tag questions with one or more types of KGs.
We then examine the skew in the distribution of questions for each KG.
These new questions can be added to existing VQA datasets to increase the diversity of questions and reduce the skew.
arXiv Detail & Related papers (2020-04-08T00:27:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.