Joint Generative Modeling of Scene Graphs and Images via Diffusion
Models
- URL: http://arxiv.org/abs/2401.01130v1
- Date: Tue, 2 Jan 2024 10:10:29 GMT
- Title: Joint Generative Modeling of Scene Graphs and Images via Diffusion
Models
- Authors: Bicheng Xu, Qi Yan, Renjie Liao, Lele Wang, Leonid Sigal
- Abstract summary: We present a novel generative task: joint scene graph - image generation.
We introduce a novel diffusion model, DiffuseSG, that jointly models the adjacency matrix along with heterogeneous node and edge attributes.
With a graph transformer being the denoiser, DiffuseSG successively denoises the scene graph representation in a continuous space and discretizes the final representation to generate the clean scene graph.
- Score: 37.788957749123725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a novel generative task: joint scene graph - image
generation. While previous works have explored image generation conditioned on
scene graphs or layouts, our task is distinctive and important as it involves
generating scene graphs themselves unconditionally from noise, enabling
efficient and interpretable control for image generation. Our task is
challenging, requiring the generation of plausible scene graphs with
heterogeneous attributes for nodes (objects) and edges (relations among
objects), including continuous object bounding boxes and discrete object and
relation categories. We introduce a novel diffusion model, DiffuseSG, that
jointly models the adjacency matrix along with heterogeneous node and edge
attributes. We explore various types of encodings for the categorical data,
relaxing it into a continuous space. With a graph transformer being the
denoiser, DiffuseSG successively denoises the scene graph representation in a
continuous space and discretizes the final representation to generate the clean
scene graph. Additionally, we introduce an IoU regularization to enhance the
empirical performance. Our model significantly outperforms existing methods in
scene graph generation on the Visual Genome and COCO-Stuff datasets, both on
standard and newly introduced metrics that better capture the problem
complexity. Moreover, we demonstrate the additional benefits of our model in
two downstream applications: 1) excelling in a series of scene graph completion
tasks, and 2) improving scene graph detection models by using extra training
samples generated from DiffuseSG.
Related papers
- Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation [10.678727237318503]
Impar, a novel training framework that leverages curriculum learning and loss masking to mitigate bias generation and anticipation modelling.
We introduce two new tasks, Robust Spatio-Temporal Scene Graph Generation and Robust Scene Graph Anticipation, designed to evaluate the robustness of STSG models against distribution shifts.
arXiv Detail & Related papers (2024-11-20T06:15:28Z) - Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation [44.457347230146404]
We leverage the scene graph, a powerful structured representation, for complex image generation.
We employ the generative capabilities of variational autoencoders and diffusion models in a generalizable manner.
Our method outperforms recent competitors based on text, layout, or scene graph.
arXiv Detail & Related papers (2024-10-01T07:02:46Z) - Local-Global Information Interaction Debiasing for Dynamic Scene Graph
Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information.
Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z) - GrannGAN: Graph annotation generative adversarial networks [72.66289932625742]
We consider the problem of modelling high-dimensional distributions and generating new examples of data with complex relational feature structure coherent with a graph skeleton.
The model we propose tackles the problem of generating the data features constrained by the specific graph structure of each data point by splitting the task into two phases.
In the first it models the distribution of features associated with the nodes of the given graph, in the second it complements the edge features conditionally on the node features.
arXiv Detail & Related papers (2022-12-01T11:49:07Z) - Diffusion-Based Scene Graph to Image Generation with Masked Contrastive
Pre-Training [112.94542676251133]
We propose to learn scene graph embeddings by directly optimizing their alignment with images.
Specifically, we pre-train an encoder to extract both global and local information from scene graphs.
The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections.
arXiv Detail & Related papers (2022-11-21T01:11:19Z) - Scene Graph Modification as Incremental Structure Expanding [61.84291817776118]
We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
arXiv Detail & Related papers (2022-09-15T16:26:14Z) - Iterative Scene Graph Generation [55.893695946885174]
Scene graph generation involves identifying object entities and their corresponding interaction predicates in a given image (or video)
Existing approaches to scene graph generation assume certain factorization of the joint distribution to make the estimation iteration feasible.
We propose a novel framework that addresses this limitation, as well as introduces dynamic conditioning on the image.
arXiv Detail & Related papers (2022-07-27T10:37:29Z) - Unconditional Scene Graph Generation [72.53624470737712]
We develop a deep auto-regressive model called SceneGraphGen which can learn the probability distribution over labelled and directed graphs.
We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes.
arXiv Detail & Related papers (2021-08-12T17:57:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.