C3VQG: Category Consistent Cyclic Visual Question Generation
- URL: http://arxiv.org/abs/2005.07771v5
- Date: Sat, 9 Jan 2021 14:26:57 GMT
- Title: C3VQG: Category Consistent Cyclic Visual Question Generation
- Authors: Shagun Uppal, Anish Madan, Sarthak Bhagat, Yi Yu, Rajiv Ratn Shah
- Abstract summary: Visual Question Generation (VQG) is the task of generating natural questions based on an image.
In this paper, we try to exploit the different visual cues and concepts in an image to generate questions using a variational autoencoder (VAE) without ground-truth answers.
Our approach solves two major shortcomings of existing VQG systems: (i) minimize the level of supervision and (ii) replace generic questions with category relevant generations.
- Score: 51.339348810676896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Question Generation (VQG) is the task of generating natural questions
based on an image. Popular methods in the past have explored image-to-sequence
architectures trained with maximum likelihood which have demonstrated
meaningful generated questions given an image and its associated ground-truth
answer. VQG becomes more challenging if the image contains rich contextual
information describing its different semantic categories. In this paper, we try
to exploit the different visual cues and concepts in an image to generate
questions using a variational autoencoder (VAE) without ground-truth answers.
Our approach solves two major shortcomings of existing VQG systems: (i)
minimize the level of supervision and (ii) replace generic questions with
category relevant generations. Most importantly, by eliminating expensive
answer annotations, the required supervision is weakened. Using different
categories enables us to exploit different concepts as the inference requires
only the image and the category. Mutual information is maximized between the
image, question, and answer category in the latent space of our VAE. A novel
category consistent cyclic loss is proposed to enable the model to generate
consistent predictions with respect to the answer category, reducing
redundancies and irregularities. Additionally, we also impose supplementary
constraints on the latent space of our generative model to provide structure
based on categories and enhance generalization by encapsulating decorrelated
features within each dimension. Through extensive experiments, the proposed
model, C3VQG outperforms state-of-the-art VQG methods with weak supervision.
Related papers
- QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems [3.486120902611884]
This paper explores the significance of different question types for VQA systems and their impact on performance.
We propose QTG-VQA, a novel architecture that incorporates question-type-guided attention and adaptive learning mechanism.
arXiv Detail & Related papers (2024-09-14T07:42:41Z) - Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference.
We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z) - ConVQG: Contrastive Visual Question Generation with Multimodal Guidance [20.009626292937995]
We propose Contrastive Visual Question Generation (ConVQG) to generate image-grounded, text-guided, and knowledge-rich questions.
Experiments on knowledge-aware and standard VQG benchmarks demonstrate that ConVQG outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-02-20T09:20:30Z) - Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation [64.64849950642619]
We develop an evaluation framework inspired by formal semantics for evaluating text-to-image models.
We show that Davidsonian Scene Graph (DSG) produces atomic and unique questions organized in dependency graphs.
We also present DSG-1k, an open-sourced evaluation benchmark that includes 1,060 prompts.
arXiv Detail & Related papers (2023-10-27T16:20:10Z) - Guiding Visual Question Generation [40.56637275354495]
In traditional Visual Question Generation (VQG), most images have multiple concepts for which a question could be generated.
We present Guiding Visual Question Generation - a variant of VQG which conditions the question generator on categorical information.
arXiv Detail & Related papers (2021-10-15T17:38:08Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z) - Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
Class-imbalance View [129.392671317356]
We propose to interpret the language prior problem in VQA from a class-imbalance view.
It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer.
We also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification.
arXiv Detail & Related papers (2020-10-30T00:57:17Z) - Cross-modal Knowledge Reasoning for Knowledge-based Visual Question
Answering [27.042604046441426]
Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image.
In this paper, we depict an image by multiple knowledge graphs from the visual, semantic and factual views.
We decompose the model into a series of memory-based reasoning steps, each performed by a G raph-based R ead, U pdate, and C ontrol.
We achieve a new state-of-the-art performance on three popular benchmark datasets, including FVQA, Visual7W-KB and OK-VQA.
arXiv Detail & Related papers (2020-08-31T23:25:01Z) - Understanding Knowledge Gaps in Visual Question Answering: Implications
for Gap Identification and Testing [20.117014315684287]
We use a taxonomy of Knowledge Gaps (KGs) to tag questions with one or more types of KGs.
We then examine the skew in the distribution of questions for each KG.
These new questions can be added to existing VQA datasets to increase the diversity of questions and reduce the skew.
arXiv Detail & Related papers (2020-04-08T00:27:43Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.