Related papers: Generative Compositional Augmentations for Scene Graph Prediction

Generative Compositional Augmentations for Scene Graph Prediction

URL: http://arxiv.org/abs/2007.05756v3
Date: Fri, 1 Oct 2021 15:33:30 GMT
Title: Generative Compositional Augmentations for Scene Graph Prediction
Authors: Boris Knyazev, Harm de Vries, C\u{a}t\u{a}lina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky
Abstract summary: Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language. We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution. We propose and empirically study a model based on conditional generative adversarial networks (GANs) that allows us to generate visual features of perturbed scene graphs.
Score: 27.535630110794855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language. We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution. Current scene graph generation models are trained on a tiny fraction of the distribution corresponding to the most frequent compositions, e.g. <cup, on, table>. However, test images might contain zero- and few-shot compositions of objects and relationships, e.g. <cup, on, surfboard>. Despite each of the object categories and the predicate (e.g. 'on') being frequent in the training data, the models often fail to properly understand such unseen or rare compositions. To improve generalization, it is natural to attempt increasing the diversity of the training distribution. However, in the graph domain this is non-trivial. To that end, we propose a method to synthesize rare yet plausible scene graphs by perturbing real ones. We then propose and empirically study a model based on conditional generative adversarial networks (GANs) that allows us to generate visual features of perturbed scene graphs and learn from them in a joint fashion. When evaluated on the Visual Genome dataset, our approach yields marginal, but consistent improvements in zero- and few-shot metrics. We analyze the limitations of our approach indicating promising directions for future research.

Related papers

Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees [50.78679002846741]
We introduce a novel approach for learning cross-task generalities in graphs. We propose task-trees as basic learning instances to align task spaces on graphs. Our findings indicate that when a graph neural network is pretrained on diverse task-trees, it acquires transferable knowledge.
arXiv Detail & Related papers (2024-12-21T02:07:43Z)
Joint Generative Modeling of Scene Graphs and Images via Diffusion Models [37.788957749123725]
We present a novel generative task: joint scene graph - image generation. We introduce a novel diffusion model, DiffuseSG, that jointly models the adjacency matrix along with heterogeneous node and edge attributes. With a graph transformer being the denoiser, DiffuseSG successively denoises the scene graph representation in a continuous space and discretizes the final representation to generate the clean scene graph.
arXiv Detail & Related papers (2024-01-02T10:10:29Z)
GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives [69.36723767339001]
We propose a novel framework named textitGPT4SGG to obtain more accurate and comprehensive scene graph signals. We show textitGPT4SGG significantly improves the performance of SGG models trained on image-caption data.
arXiv Detail & Related papers (2023-12-07T14:11:00Z)
Fine-Grained is Too Coarse: A Novel Data-Centric Approach for Efficient Scene Graph Generation [0.7851536646859476]
We introduce the task of Efficient Scene Graph Generation (SGG) that prioritizes the generation of relevant relations. We present a new dataset, VG150-curated, based on the annotations of the popular Visual Genome dataset. We show through a set of experiments that this dataset contains more high-quality and diverse annotations than the one usually use in SGG.
arXiv Detail & Related papers (2023-05-30T00:55:49Z)
Learnable Graph Matching: A Practical Paradigm for Data Association [74.28753343714858]
We propose a general learnable graph matching method to address these issues. Our method achieves state-of-the-art performance on several MOT datasets. For image matching, our method outperforms state-of-the-art methods on a popular indoor dataset, ScanNet.
arXiv Detail & Related papers (2023-03-27T17:39:00Z)
Unconditional Scene Graph Generation [72.53624470737712]
We develop a deep auto-regressive model called SceneGraphGen which can learn the probability distribution over labelled and directed graphs. We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes.
arXiv Detail & Related papers (2021-08-12T17:57:16Z)
A Robust and Generalized Framework for Adversarial Graph Embedding [73.37228022428663]
We propose a robust framework for adversarial graph embedding, named AGE. AGE generates the fake neighbor nodes as the enhanced negative samples from the implicit distribution. Based on this framework, we propose three models to handle three types of graph data.
arXiv Detail & Related papers (2021-05-22T07:05:48Z)
Semi-Supervised Graph-to-Graph Translation [31.47555366566109]
Graph translation is a promising research direction and has a wide range of potential real-world applications. One important reason is the lack of high-quality paired dataset. We propose to construct a dual representation space, where transformation is performed explicitly to model the semantic transitions.
arXiv Detail & Related papers (2021-03-16T03:24:20Z)
Dual ResGCN for Balanced Scene GraphGeneration [106.7828712878278]
We propose a novel model, dubbed textitdual ResGCN, which consists of an object residual graph convolutional network and a relation residual graph convolutional network. The two networks are complementary to each other. The former captures object-level context information, textiti.e., the connections among objects. The latter is carefully designed to explicitly capture relation-level context information textiti.e., the connections among relations.
arXiv Detail & Related papers (2020-11-09T07:44:17Z)
Multilayer Clustered Graph Learning [66.94201299553336]
We use contrastive loss as a data fidelity term, in order to properly aggregate the observed layers into a representative graph. Experiments show that our method leads to a clustered clusters w.r.t. We learn a clustering algorithm for solving clustering problems.
arXiv Detail & Related papers (2020-10-29T09:58:02Z)
Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation [27.535630110794855]
Scene graph generation aims to predict graph-structured descriptions of input images. It is important - yet challenging - to perform well on novel (zero-shot) or rare (few-shot) compositions of objects and relationships. We show that the standard loss used in this task is unintentionally a function of scene graph density. We introduce a density-normalized edge loss, which provides more than a two-fold improvement in certain generalization metrics.
arXiv Detail & Related papers (2020-05-17T11:45:29Z)
Bridging Knowledge Graphs to Generate Scene Graphs [49.69377653925448]
We propose a novel graph-based neural network that iteratively propagates information between the two graphs, as well as within each of them. Our Graph Bridging Network, GB-Net, successively infers edges and nodes, allowing to simultaneously exploit and refine the rich, heterogeneous structure of the interconnected scene and commonsense graphs.
arXiv Detail & Related papers (2020-01-07T23:35:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.