Context-Aware Layout to Image Generation with Enhanced Object Appearance
- URL: http://arxiv.org/abs/2103.11897v1
- Date: Mon, 22 Mar 2021 14:43:25 GMT
- Title: Context-Aware Layout to Image Generation with Enhanced Object Appearance
- Authors: Sen He, Wentong Liao, Michael Ying Yang, Yongxin Yang, Yi-Zhe Song,
Bodo Rosenhahn, Tao Xiang
- Abstract summary: A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff)
Existing L2I models have made great progress, but object-to-object and object-to-stuff relations are often broken.
We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators.
- Score: 123.62597976732948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A layout to image (L2I) generation model aims to generate a complicated image
containing multiple objects (things) against natural background (stuff),
conditioned on a given layout. Built upon the recent advances in generative
adversarial networks (GANs), existing L2I models have made great progress.
However, a close inspection of their generated images reveals two major
limitations: (1) the object-to-object as well as object-to-stuff relations are
often broken and (2) each object's appearance is typically distorted lacking
the key defining characteristics associated with the object class. We argue
that these are caused by the lack of context-aware object and stuff feature
encoding in their generators, and location-sensitive appearance representation
in their discriminators. To address these limitations, two new modules are
proposed in this work. First, a context-aware feature transformation module is
introduced in the generator to ensure that the generated feature encoding of
either object or stuff is aware of other co-existing objects/stuff in the
scene. Second, instead of feeding location-insensitive image features to the
discriminator, we use the Gram matrix computed from the feature maps of the
generated object images to preserve location-sensitive information, resulting
in much enhanced object appearance. Extensive experiments show that the
proposed method achieves state-of-the-art performance on the COCO-Thing-Stuff
and Visual Genome benchmarks.
Related papers
- SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects [20.978091381109294]
We propose a method to generate articulated objects from a single image.
Our method generates an articulated object that is visually consistent with the input image.
Our experiments show that our method outperforms the state-of-the-art in articulated object creation.
arXiv Detail & Related papers (2024-10-21T20:41:32Z) - Beyond One-to-One: Rethinking the Referring Image Segmentation [117.53010476628029]
Referring image segmentation aims to segment the target object referred by a natural language expression.
We propose a Dual Multi-Modal Interaction (DMMI) Network, which contains two decoder branches.
In the text-to-image decoder, text embedding is utilized to query the visual feature and localize the corresponding target.
Meanwhile, the image-to-text decoder is implemented to reconstruct the erased entity-phrase conditioned on the visual feature.
arXiv Detail & Related papers (2023-08-26T11:39:22Z) - Position-Aware Contrastive Alignment for Referring Image Segmentation [65.16214741785633]
We present a position-aware contrastive alignment network (PCAN) to enhance the alignment of multi-modal features.
Our PCAN consists of two modules: 1) Position Aware Module (PAM), which provides position information of all objects related to natural language descriptions, and 2) Contrastive Language Understanding Module (CLUM), which enhances multi-modal alignment.
arXiv Detail & Related papers (2022-12-27T09:13:19Z) - Image Segmentation-based Unsupervised Multiple Objects Discovery [1.7674345486888503]
Unsupervised object discovery aims to localize objects in images.
We propose a fully unsupervised, bottom-up approach, for multiple objects discovery.
We provide state-of-the-art results for both unsupervised class-agnostic object detection and unsupervised image segmentation.
arXiv Detail & Related papers (2022-12-20T09:48:24Z) - IR-GAN: Image Manipulation with Linguistic Instruction by Increment
Reasoning [110.7118381246156]
Increment Reasoning Generative Adversarial Network (IR-GAN) aims to reason consistency between visual increment in images and semantic increment in instructions.
First, we introduce the word-level and instruction-level instruction encoders to learn user's intention from history-correlated instructions as semantic increment.
Second, we embed the representation of semantic increment into that of source image for generating target image, where source image plays the role of referring auxiliary.
arXiv Detail & Related papers (2022-04-02T07:48:39Z) - Local and Global GANs with Semantic-Aware Upsampling for Image
Generation [201.39323496042527]
We consider generating images using local context.
We propose a class-specific generative network using semantic maps as guidance.
Lastly, we propose a novel semantic-aware upsampling method.
arXiv Detail & Related papers (2022-02-28T19:24:25Z) - Exploiting Relationship for Complex-scene Image Generation [43.022978211274065]
This work explores relationship-aware complex-scene image generation, where multiple objects are inter-related as a scene graph.
We propose three major updates in the generation framework. First, reasonable spatial layouts are inferred by jointly considering the semantics and relationships among objects.
Second, since the relations between objects significantly influence an object's appearance, we design a relation-guided generator to generate objects reflecting their relationships.
Third, a novel scene graph discriminator is proposed to guarantee the consistency between the generated image and the input scene graph.
arXiv Detail & Related papers (2021-04-01T09:21:39Z) - Attribute-guided image generation from layout [38.817023543020134]
We propose a new image generation method that enables instance-level attribute control.
Experiments on Visual Genome dataset demonstrate our model's capacity to control object-level attributes in generated images.
The generated images from our model have higher resolution, object classification accuracy and consistency, as compared to the previous state-of-the-art.
arXiv Detail & Related papers (2020-08-27T06:22:14Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.