LayoutDiffuse: Adapting Foundational Diffusion Models for
Layout-to-Image Generation
- URL: http://arxiv.org/abs/2302.08908v1
- Date: Thu, 16 Feb 2023 14:20:25 GMT
- Title: LayoutDiffuse: Adapting Foundational Diffusion Models for
Layout-to-Image Generation
- Authors: Jiaxin Cheng, Xiao Liang, Xingjian Shi, Tong He, Tianjun Xiao and Mu
Li
- Abstract summary: Our method trains efficiently, generates images with both high perceptual quality and layout alignment.
Our method significantly outperforms other 10 generative models based on GANs, VQ-VAE, and diffusion models.
- Score: 24.694298869398033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Layout-to-image generation refers to the task of synthesizing photo-realistic
images based on semantic layouts. In this paper, we propose LayoutDiffuse that
adapts a foundational diffusion model pretrained on large-scale image or
text-image datasets for layout-to-image generation. By adopting a novel neural
adaptor based on layout attention and task-aware prompts, our method trains
efficiently, generates images with both high perceptual quality and layout
alignment, and needs less data. Experiments on three datasets show that our
method significantly outperforms other 10 generative models based on GANs,
VQ-VAE, and diffusion models.
Related papers
- OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models [34.611309081801345]
Large diffusion-based Text-to-Image (T2I) models have shown impressive generative powers for text-to-image generation.
In this paper, we propose a novel strategy to scale a generative model across new tasks with minimal compute.
arXiv Detail & Related papers (2024-04-15T17:55:56Z) - Reason out Your Layout: Evoking the Layout Master from Large Language
Models for Text-to-Image Synthesis [47.27044390204868]
We introduce a novel approach to improving T2I diffusion models using Large Language Models (LLMs) as layout generators.
Our experiments demonstrate significant improvements in image quality and layout accuracy.
arXiv Detail & Related papers (2023-11-28T14:51:13Z) - Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion [50.59261592343479]
We present Kandinsky1, a novel exploration of latent diffusion architecture.
The proposed model is trained separately to map text embeddings to image embeddings of CLIP.
We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting.
arXiv Detail & Related papers (2023-10-05T12:29:41Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - Training on Thin Air: Improve Image Classification with Generated Data [28.96941414724037]
Diffusion Inversion is a simple yet effective method to generate diverse, high-quality training data for image classification.
Our approach captures the original data distribution and ensures data coverage by inverting images to the latent space of Stable Diffusion.
We identify three key components that allow our generated images to successfully supplant the original dataset.
arXiv Detail & Related papers (2023-05-24T16:33:02Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model.
This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.