Learning Visual Commonsense for Robust Scene Graph Generation
- URL: http://arxiv.org/abs/2006.09623v2
- Date: Sat, 18 Jul 2020 11:10:45 GMT
- Title: Learning Visual Commonsense for Robust Scene Graph Generation
- Authors: Alireza Zareian and Zhecan Wang and Haoxuan You and Shih-Fu Chang
- Abstract summary: Scene graph generation models are prone to mistakes due to the challenges of perception in the wild.
We propose the first method to acquire visual commonsense such as affordance and intuitive physics automatically from data.
We show our model learns commonsense better than any alternative, and improves the accuracy of state-of-the-art scene graph generation methods.
- Score: 49.208518291993705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene graph generation models understand the scene through object and
predicate recognition, but are prone to mistakes due to the challenges of
perception in the wild. Perception errors often lead to nonsensical
compositions in the output scene graph, which do not follow real-world rules
and patterns, and can be corrected using commonsense knowledge. We propose the
first method to acquire visual commonsense such as affordance and intuitive
physics automatically from data, and use that to improve the robustness of
scene understanding. To this end, we extend Transformer models to incorporate
the structure of scene graphs, and train our Global-Local Attention Transformer
on a scene graph corpus. Once trained, our model can be applied on any scene
graph generation model and correct its obvious mistakes, resulting in more
semantically plausible scene graphs. Through extensive experiments, we show our
model learns commonsense better than any alternative, and improves the accuracy
of state-of-the-art scene graph generation methods.
Related papers
- Generative Visual Commonsense Answering and Explaining with Generative Scene Graph Constructing [46.701439459096235]
We propose a novel visual commonsense reasoning generation method named textittextbfG2.
It first utilizes the image patches and LLMs to construct a location-free scene graph, and then answer and explain based on the scene graph's information.
We also propose automatic scene graph filtering and selection strategies to absorb valuable scene graph information during training.
arXiv Detail & Related papers (2025-01-15T04:00:36Z) - Uncovering Capabilities of Model Pruning in Graph Contrastive Learning [6.872289094878493]
We reformulate the problem of graph contrastive learning via contrasting different model versions rather than augmented views.
We extensively validate our method on various benchmarks regarding graph classification via unsupervised and transfer learning.
arXiv Detail & Related papers (2024-10-27T07:09:31Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - Diffusion-Based Scene Graph to Image Generation with Masked Contrastive
Pre-Training [112.94542676251133]
We propose to learn scene graph embeddings by directly optimizing their alignment with images.
Specifically, we pre-train an encoder to extract both global and local information from scene graphs.
The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections.
arXiv Detail & Related papers (2022-11-21T01:11:19Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - Unconditional Scene Graph Generation [72.53624470737712]
We develop a deep auto-regressive model called SceneGraphGen which can learn the probability distribution over labelled and directed graphs.
We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes.
arXiv Detail & Related papers (2021-08-12T17:57:16Z) - Generative Compositional Augmentations for Scene Graph Prediction [27.535630110794855]
Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language.
We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution.
We propose and empirically study a model based on conditional generative adversarial networks (GANs) that allows us to generate visual features of perturbed scene graphs.
arXiv Detail & Related papers (2020-07-11T12:11:53Z) - Bridging Knowledge Graphs to Generate Scene Graphs [49.69377653925448]
We propose a novel graph-based neural network that iteratively propagates information between the two graphs, as well as within each of them.
Our Graph Bridging Network, GB-Net, successively infers edges and nodes, allowing to simultaneously exploit and refine the rich, heterogeneous structure of the interconnected scene and commonsense graphs.
arXiv Detail & Related papers (2020-01-07T23:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.