Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds
- URL: http://arxiv.org/abs/2411.18810v3
- Date: Fri, 07 Feb 2025 17:14:32 GMT
- Title: Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds
- Authors: Shuangqi Li, Hieu Le, Jingyi Xu, Mathieu Salzmann,
- Abstract summary: Text-to-image diffusion models can generate realistic images from arbitrary text prompts.
They often produce inconsistent results for compositional prompts such as "two dogs" or "a penguin on the right of a bowl"
- Score: 63.753710512888965
- License:
- Abstract: Text-to-image diffusion models have demonstrated remarkable capability in generating realistic images from arbitrary text prompts. However, they often produce inconsistent results for compositional prompts such as "two dogs" or "a penguin on the right of a bowl". Understanding these inconsistencies is crucial for reliable image generation. In this paper, we highlight the significant role of initial noise in these inconsistencies, where certain noise patterns are more reliable for compositional prompts than others. Our analyses reveal that different initial random seeds tend to guide the model to place objects in distinct image areas, potentially adhering to specific patterns of camera angles and image composition associated with the seed. To improve the model's compositional ability, we propose a method for mining these reliable cases, resulting in a curated training set of generated images without requiring any manual annotation. By fine-tuning text-to-image models on these generated images, we significantly enhance their compositional capabilities. For numerical composition, we observe relative increases of 29.3% and 19.5% for Stable Diffusion and PixArt-{\alpha}, respectively. Spatial composition sees even larger gains, with 60.7% for Stable Diffusion and 21.1% for PixArt-{\alpha}.
Related papers
- Efficient Pruning of Text-to-Image Models: Insights from Pruning Stable Diffusion [3.399289369740637]
This paper presents a pioneering study on post-training pruning of Stable Diffusion 2.
It addresses the critical need for model compression in text-to-image domain.
We propose an optimal pruning configuration that prunes the text encoder to 47.5% and the diffusion generator to 35%.
arXiv Detail & Related papers (2024-11-22T18:29:37Z) - FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior [50.0535198082903]
We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image.
We showcase the potential of utilizing the powerful generative prior inherent in large-scale pre-trained diffusion models to accomplish generic image composition.
arXiv Detail & Related papers (2024-07-06T03:35:43Z) - Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models [13.4617544015866]
We conduct a large-scale scientific study into the impact of random seeds during diffusion inference.
We find that the best 'golden' seed achieved an impressive FID of 21.60, compared to the worst 'inferior' seed's FID of 31.97.
A classifier can predict the seed number used to generate an image with over 99.9% accuracy in just a few epochs.
arXiv Detail & Related papers (2024-05-23T17:46:23Z) - Preserving Image Properties Through Initializations in Diffusion Models [6.804700416902898]
We show that Stable Diffusion methods, as currently applied, do not respect requirements of retail photography.
The usual practice of training the denoiser with a very noisy image leads to inconsistent generated images during inference.
A network trained with centered retail product images with uniform backgrounds generates images with erratic backgrounds.
Our procedure can interact well with other control-based methods to further enhance the controllability of diffusion-based methods.
arXiv Detail & Related papers (2024-01-04T06:55:49Z) - Improving Diffusion-Based Image Synthesis with Context Prediction [49.186366441954846]
Existing diffusion models mainly try to reconstruct input image from a corrupted one with a pixel-wise or feature-wise constraint along spatial axes.
We propose ConPreDiff to improve diffusion-based image synthesis with context prediction.
Our ConPreDiff consistently outperforms previous methods and achieves a new SOTA text-to-image generation results on MS-COCO, with a zero-shot FID score of 6.21.
arXiv Detail & Related papers (2024-01-04T01:10:56Z) - Uncovering the Disentanglement Capability in Text-to-Image Diffusion
Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes.
We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z) - Training-Free Structured Diffusion Guidance for Compositional
Text-to-Image Synthesis [78.28620571530706]
Large-scale diffusion models have achieved state-of-the-art results on text-to-image synthesis (T2I) tasks.
We improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.
arXiv Detail & Related papers (2022-12-09T18:30:24Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - OptGAN: Optimizing and Interpreting the Latent Space of the Conditional
Text-to-Image GANs [8.26410341981427]
We study how to ensure that generated samples are believable, realistic or natural.
We present a novel algorithm which identifies semantically-understandable directions in the latent space of a conditional text-to-image GAN architecture.
arXiv Detail & Related papers (2022-02-25T20:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.