Exploiting Relationship for Complex-scene Image Generation
- URL: http://arxiv.org/abs/2104.00356v1
- Date: Thu, 1 Apr 2021 09:21:39 GMT
- Title: Exploiting Relationship for Complex-scene Image Generation
- Authors: Tianyu Hua, Hongdong Zheng, Yalong Bai, Wei Zhang, Xiao-Ping Zhang,
Tao Mei
- Abstract summary: This work explores relationship-aware complex-scene image generation, where multiple objects are inter-related as a scene graph.
We propose three major updates in the generation framework. First, reasonable spatial layouts are inferred by jointly considering the semantics and relationships among objects.
Second, since the relations between objects significantly influence an object's appearance, we design a relation-guided generator to generate objects reflecting their relationships.
Third, a novel scene graph discriminator is proposed to guarantee the consistency between the generated image and the input scene graph.
- Score: 43.022978211274065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The significant progress on Generative Adversarial Networks (GANs) has
facilitated realistic single-object image generation based on language input.
However, complex-scene generation (with various interactions among multiple
objects) still suffers from messy layouts and object distortions, due to
diverse configurations in layouts and appearances. Prior methods are mostly
object-driven and ignore their inter-relations that play a significant role in
complex-scene images. This work explores relationship-aware complex-scene image
generation, where multiple objects are inter-related as a scene graph. With the
help of relationships, we propose three major updates in the generation
framework. First, reasonable spatial layouts are inferred by jointly
considering the semantics and relationships among objects. Compared to standard
location regression, we show relative scales and distances serve a more
reliable target. Second, since the relations between objects significantly
influence an object's appearance, we design a relation-guided generator to
generate objects reflecting their relationships. Third, a novel scene graph
discriminator is proposed to guarantee the consistency between the generated
image and the input scene graph. Our method tends to synthesize plausible
layouts and objects, respecting the interplay of multiple objects in an image.
Experimental results on Visual Genome and HICO-DET datasets show that our
proposed method significantly outperforms prior arts in terms of IS and FID
metrics. Based on our user study and visual inspection, our method is more
effective in generating logical layout and appearance for complex-scenes.
Related papers
- RelationBooth: Towards Relation-Aware Customized Object Generation [32.762475563341525]
We introduce RelationBooth, a framework that disentangles identity and relation learning through a well-curated dataset.
Our training data consists of relation-specific images, independent object images containing identity information, and text prompts to guide relation generation.
First, we introduce a keypoint matching loss that effectively guides the model in adjusting object poses closely tied to their relationships.
Second, we incorporate local features from the image prompts to better distinguish between objects, preventing confusion in overlapping cases.
arXiv Detail & Related papers (2024-10-30T17:57:21Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - Learning-based Relational Object Matching Across Views [63.63338392484501]
We propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images.
We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network.
arXiv Detail & Related papers (2023-05-03T19:36:51Z) - Hand-Object Interaction Image Generation [135.87707468156057]
This work is dedicated to a new task, i.e., hand-object interaction image generation.
It aims to conditionally generate the hand-object image under the given hand, object and their interaction status.
This task is challenging and research-worthy in many potential application scenarios, such as AR/VR games and online shopping.
arXiv Detail & Related papers (2022-11-28T18:59:57Z) - Grounding Scene Graphs on Natural Images via Visio-Lingual Message
Passing [17.63475613154152]
This paper presents a framework for jointly grounding objects that follow certain semantic relationship constraints in a scene graph.
A scene graph is an efficient and structured way to represent all the objects and their semantic relationships in the image.
arXiv Detail & Related papers (2022-11-03T16:46:46Z) - Relationformer: A Unified Framework for Image-to-Graph Generation [18.832626244362075]
This work proposes a unified one-stage transformer-based framework, namely Relationformer, that jointly predicts objects and their relations.
We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly.
We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets.
arXiv Detail & Related papers (2022-03-19T00:36:59Z) - Scenes and Surroundings: Scene Graph Generation using Relation
Transformer [13.146732454123326]
This work proposes a novel local-context aware architecture named relation transformer.
Our hierarchical multi-head attention-based approach efficiently captures contextual dependencies between objects and predicts their relationships.
In comparison to state-of-the-art approaches, we have achieved an overall mean textbf4.85% improvement.
arXiv Detail & Related papers (2021-07-12T14:22:20Z) - ORD: Object Relationship Discovery for Visual Dialogue Generation [60.471670447176656]
We propose an object relationship discovery (ORD) framework to preserve the object interactions for visual dialogue generation.
A hierarchical graph convolutional network (HierGCN) is proposed to retain the object nodes and neighbour relationships locally, and then refines the object-object connections globally.
Experiments have proved that the proposed method can significantly improve the quality of dialogue by utilising the contextual information of visual relationships.
arXiv Detail & Related papers (2020-06-15T12:25:40Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.