Guiding Visual Question Generation
- URL: http://arxiv.org/abs/2110.08226v1
- Date: Fri, 15 Oct 2021 17:38:08 GMT
- Title: Guiding Visual Question Generation
- Authors: Nihir Vedd, Zixu Wang, Marek Rei, Yishu Miao and Lucia Specia
- Abstract summary: In traditional Visual Question Generation (VQG), most images have multiple concepts for which a question could be generated.
We present Guiding Visual Question Generation - a variant of VQG which conditions the question generator on categorical information.
- Score: 40.56637275354495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In traditional Visual Question Generation (VQG), most images have multiple
concepts (e.g. objects and categories) for which a question could be generated,
but models are trained to mimic an arbitrary choice of concept as given in
their training data. This makes training difficult and also poses issues for
evaluation -- multiple valid questions exist for most images but only one or a
few are captured by the human references. We present Guiding Visual Question
Generation - a variant of VQG which conditions the question generator on
categorical information based on expectations on the type of question and the
objects it should explore. We propose two variants: (i) an explicitly guided
model that enables an actor (human or automated) to select which objects and
categories to generate a question for; and (ii) an implicitly guided model that
learns which objects and categories to condition on, based on discrete latent
variables. The proposed models are evaluated on an answer-category augmented
VQA dataset and our quantitative results show a substantial improvement over
the current state of the art (over 9 BLEU-4 increase). Human evaluation
validates that guidance helps the generation of questions that are
grammatically coherent and relevant to the given image and objects.
Related papers
- A Comprehensive Survey on Visual Question Answering Datasets and Algorithms [1.941892373913038]
We meticulously analyze the current state of VQA datasets and models, while cleanly dividing them into distinct categories and then summarizing the methodologies and characteristics of each category.
We explore six main paradigms of VQA models: fusion, attention, the technique of using information from one modality to filter information from another, external knowledge base, composition or reasoning, and graph models.
arXiv Detail & Related papers (2024-11-17T18:52:06Z) - Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference.
We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Visual Question Generation in Bengali [0.0]
We develop a novel transformer-based encoder-decoder architecture that generates questions in Bengali when given an image.
We establish the first state of the art models for Visual Question Generation task in Bengali.
Our results show that our image-cat model achieves a BLUE-1 score of 33.12 and BLEU-3 score of 7.56.
arXiv Detail & Related papers (2023-10-12T10:26:26Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - COIN: Counterfactual Image Generation for VQA Interpretation [5.994412766684842]
We introduce an interpretability approach for VQA models by generating counterfactual images.
In addition to interpreting the result of VQA models on single images, the obtained results and the discussion provides an extensive explanation of VQA models' behaviour.
arXiv Detail & Related papers (2022-01-10T13:51:35Z) - Human-Adversarial Visual Question Answering [62.30715496829321]
We benchmark state-of-the-art VQA models against human-adversarial examples.
We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples.
arXiv Detail & Related papers (2021-06-04T06:25:32Z) - Latent Variable Models for Visual Question Answering [34.9601948665926]
We propose latent variable models for Visual Question Answering.
Extra information (e.g. captions and answer categories) are incorporated as latent variables to improve inference.
Experiments on the VQA v2.0 benchmarking dataset demonstrate the effectiveness of our proposed models.
arXiv Detail & Related papers (2021-01-16T08:21:43Z) - C3VQG: Category Consistent Cyclic Visual Question Generation [51.339348810676896]
Visual Question Generation (VQG) is the task of generating natural questions based on an image.
In this paper, we try to exploit the different visual cues and concepts in an image to generate questions using a variational autoencoder (VAE) without ground-truth answers.
Our approach solves two major shortcomings of existing VQG systems: (i) minimize the level of supervision and (ii) replace generic questions with category relevant generations.
arXiv Detail & Related papers (2020-05-15T20:25:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.