Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
- URL: http://arxiv.org/abs/2409.04847v1
- Date: Sat, 7 Sep 2024 14:57:03 GMT
- Title: Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
- Authors: Jiaxin Cheng, Zixu Zhao, Tong He, Tianjun Xiao, Yicong Zhou, Zheng Zhang,
- Abstract summary: A specialized area within generative modeling is layout-to-image (L2I) generation.
We introduce a novel regional cross-attention module tailored to enrich layout-to-image generation.
We propose two metrics to assess L2I performance in open-vocabulary scenarios.
- Score: 44.094656220043106
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in generative models have significantly enhanced their capacity for image generation, enabling a wide range of applications such as image editing, completion and video editing. A specialized area within generative modeling is layout-to-image (L2I) generation, where predefined layouts of objects guide the generative process. In this study, we introduce a novel regional cross-attention module tailored to enrich layout-to-image generation. This module notably improves the representation of layout regions, particularly in scenarios where existing methods struggle with highly complex and detailed textual descriptions. Moreover, while current open-vocabulary L2I methods are trained in an open-set setting, their evaluations often occur in closed-set environments. To bridge this gap, we propose two metrics to assess L2I performance in open-vocabulary scenarios. Additionally, we conduct a comprehensive user study to validate the consistency of these metrics with human preferences.
Related papers
- Training-Free Sketch-Guided Diffusion with Latent Optimization [22.94468603089249]
We propose an innovative training-free pipeline that extends existing text-to-image generation models to incorporate a sketch as an additional condition.
To generate new images with a layout and structure closely resembling the input sketch, we find that these core features of a sketch can be tracked with the cross-attention maps of diffusion models.
We introduce latent optimization, a method that refines the noisy latent at each intermediate step of the generation process.
arXiv Detail & Related papers (2024-08-31T00:44:03Z) - LSReGen: Large-Scale Regional Generator via Backward Guidance Framework [12.408195812609042]
controllable image generation remains a challenge.
Current methods, such as training, forward guidance, and backward guidance, have notable limitations.
We propose a novel controllable generation framework that offers a generalized interpretation of backward guidance.
We introduce LSReGen, a large-scale layout-to-image method designed to generate high-quality, layout-compliant images.
arXiv Detail & Related papers (2024-07-21T05:44:46Z) - A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [117.77807994397784]
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users.
Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models.
T2I-based image editing methods significantly enhance editing performance and offer a user-friendly interface for modifying content guided by multimodal inputs.
arXiv Detail & Related papers (2024-06-20T17:58:52Z) - R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image
Generation [74.5598315066249]
We probe into zero-shot grounded T2I generation with diffusion models.
We propose a Region and Boundary (R&B) aware cross-attention guidance approach.
arXiv Detail & Related papers (2023-10-13T05:48:42Z) - LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image
Generation [121.45667242282721]
We propose a coarse-to-fine paradigm to achieve layout planning and image generation.
Our proposed method outperforms the state-of-the-art models in terms of photorealistic layout and image generation.
arXiv Detail & Related papers (2023-08-09T17:45:04Z) - Local and Global GANs with Semantic-Aware Upsampling for Image
Generation [201.39323496042527]
We consider generating images using local context.
We propose a class-specific generative network using semantic maps as guidance.
Lastly, we propose a novel semantic-aware upsampling method.
arXiv Detail & Related papers (2022-02-28T19:24:25Z) - Local Class-Specific and Global Image-Level Generative Adversarial
Networks for Semantic-Guided Scene Generation [135.4660201856059]
We consider learning the scene generation in a local context, and design a local class-specific generative network with semantic maps as a guidance.
To learn more discrimi class-specific feature representations for the local generation, a novel classification module is also proposed.
Experiments on two scene image generation tasks show superior generation performance of the proposed model.
arXiv Detail & Related papers (2019-12-27T16:14:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.