X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization
in Visual Question Answering
- URL: http://arxiv.org/abs/2107.11576v1
- Date: Sat, 24 Jul 2021 10:17:48 GMT
- Title: X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization
in Visual Question Answering
- Authors: Jingjing Jiang, Ziyi Liu, Yifan Liu, Zhixiong Nan, and Nanning Zheng
- Abstract summary: Recompositions of existing visual concepts can generate unseen compositions in the training set.
We propose a graph generative modeling-based training scheme (X-GGM) to handle the problem implicitly.
The baseline VQA model trained with the X-GGM scheme achieves state-of-the-art OOD performance on two standard VQA OOD benchmarks.
- Score: 49.36818290978525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Encouraging progress has been made towards Visual Question Answering (VQA) in
recent years, but it is still challenging to enable VQA models to adaptively
generalize to out-of-distribution (OOD) samples. Intuitively, recompositions of
existing visual concepts (i.e., attributes and objects) can generate unseen
compositions in the training set, which will promote VQA models to generalize
to OOD samples. In this paper, we formulate OOD generalization in VQA as a
compositional generalization problem and propose a graph generative
modeling-based training scheme (X-GGM) to handle the problem implicitly. X-GGM
leverages graph generative modeling to iteratively generate a relation matrix
and node representations for the predefined graph that utilizes
attribute-object pairs as nodes. Furthermore, to alleviate the unstable
training issue in graph generative modeling, we propose a gradient distribution
consistency loss to constrain the data distribution with adversarial
perturbations and the generated distribution. The baseline VQA model (LXMERT)
trained with the X-GGM scheme achieves state-of-the-art OOD performance on two
standard VQA OOD benchmarks, i.e., VQA-CP v2 and GQA-OOD. Extensive ablation
studies demonstrate the effectiveness of X-GGM components.
Related papers
- Generative Visual Question Answering [0.0]
This paper discusses a viable approach to creating an advanced Visual Question Answering (VQA) model which can produce successful results on temporal generalization.
We propose a new data set, GenVQA, utilizing images and captions from the VQAv2 and MS-COCO dataset to generate new images through stable diffusion.
Performance evaluation focuses on questions mirroring the original VQAv2 dataset, with the answers having been adjusted to the new images.
arXiv Detail & Related papers (2023-07-18T05:30:23Z) - GrannGAN: Graph annotation generative adversarial networks [72.66289932625742]
We consider the problem of modelling high-dimensional distributions and generating new examples of data with complex relational feature structure coherent with a graph skeleton.
The model we propose tackles the problem of generating the data features constrained by the specific graph structure of each data point by splitting the task into two phases.
In the first it models the distribution of features associated with the nodes of the given graph, in the second it complements the edge features conditionally on the node features.
arXiv Detail & Related papers (2022-12-01T11:49:07Z) - Generative Bias for Robust Visual Question Answering [74.42555378660653]
We propose a generative method to train the bias model directly from the target model, called GenB.
In particular, GenB employs a generative network to learn the bias in the target model through a combination of the adversarial objective and knowledge distillation.
We show through extensive experiments the effects of our method on various VQA bias datasets including VQA-CP2, VQA-CP1, GQA-OOD, and VQA-CE.
arXiv Detail & Related papers (2022-08-01T08:58:02Z) - Question-Answer Sentence Graph for Joint Modeling Answer Selection [122.29142965960138]
We train and integrate state-of-the-art (SOTA) models for computing scores between question-question, question-answer, and answer-answer pairs.
Online inference is then performed to solve the AS2 task on unseen queries.
arXiv Detail & Related papers (2022-02-16T05:59:53Z) - COIN: Counterfactual Image Generation for VQA Interpretation [5.994412766684842]
We introduce an interpretability approach for VQA models by generating counterfactual images.
In addition to interpreting the result of VQA models on single images, the obtained results and the discussion provides an extensive explanation of VQA models' behaviour.
arXiv Detail & Related papers (2022-01-10T13:51:35Z) - A Robust and Generalized Framework for Adversarial Graph Embedding [73.37228022428663]
We propose a robust framework for adversarial graph embedding, named AGE.
AGE generates the fake neighbor nodes as the enhanced negative samples from the implicit distribution.
Based on this framework, we propose three models to handle three types of graph data.
arXiv Detail & Related papers (2021-05-22T07:05:48Z) - MUTANT: A Training Paradigm for Out-of-Distribution Generalization in
Visual Question Answering [58.30291671877342]
We present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input.
MUTANT establishes a new state-of-the-art accuracy on VQA-CP with a $10.57%$ improvement.
arXiv Detail & Related papers (2020-09-18T00:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.