InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
- URL: http://arxiv.org/abs/2404.04650v1
- Date: Sat, 6 Apr 2024 14:56:59 GMT
- Title: InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
- Authors: Xiefan Guo, Jinlin Liu, Miaomiao Cui, Jiankai Li, Hongyu Yang, Di Huang,
- Abstract summary: InitNO is a paradigm that refines the initial noise in semantically-faithful images.
A strategically crafted noise optimization pipeline is developed to guide the initial noise towards valid regions.
Our method, validated through rigorous experimentation, shows a commendable proficiency in generating images in strict accordance with text prompts.
- Score: 27.508861002013358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent strides in the development of diffusion models, exemplified by advancements such as Stable Diffusion, have underscored their remarkable prowess in generating visually compelling images. However, the imperative of achieving a seamless alignment between the generated image and the provided prompt persists as a formidable challenge. This paper traces the root of these difficulties to invalid initial noise, and proposes a solution in the form of Initial Noise Optimization (InitNO), a paradigm that refines this noise. Considering text prompts, not all random noises are effective in synthesizing semantically-faithful images. We design the cross-attention response score and the self-attention conflict score to evaluate the initial noise, bifurcating the initial latent space into valid and invalid sectors. A strategically crafted noise optimization pipeline is developed to guide the initial noise towards valid regions. Our method, validated through rigorous experimentation, shows a commendable proficiency in generating images in strict accordance with text prompts. Our code is available at https://github.com/xiefan-guo/initno.
Related papers
- Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation [43.48099716183503]
We propose a training-free approach tailored to diffusion-based image-to-image translation.
Our approach can be easily incorporated into existing image-to-image translation methods.
arXiv Detail & Related papers (2024-09-12T14:30:45Z) - Beyond Image Prior: Embedding Noise Prior into Conditional Denoising Transformer [17.430622649002427]
Existing learning-based denoising methods typically train models to generalize the image prior from large-scale datasets.
We propose a new perspective on the denoising challenge by highlighting the distinct separation between noise and image priors.
We introduce a Locally Noise Prior Estimation algorithm, which accurately estimates the noise prior directly from a single raw noisy image.
arXiv Detail & Related papers (2024-07-12T08:43:11Z) - The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise [92.53724347718173]
Diffusion models have achieved remarkable success in text-to-image generation tasks.
We identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images.
arXiv Detail & Related papers (2024-06-04T05:06:00Z) - NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion Models beyond Spherical Linear Interpolation [86.7260950382448]
We propose a novel approach to correct noise for image validity, NoiseDiffusion.
NoiseDiffusion performs within the noisy image space and injects raw images into these noisy counterparts to address the challenge of information loss.
arXiv Detail & Related papers (2024-03-13T12:32:25Z) - The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization [30.622943615086584]
We formulate the lottery ticket hypothesis in denoising randomly Gaussian noise images.
Winning tickets naturally tend to be denoised into specific content independently.
We implement semantic-driven initial image construction creating initial noise from known winning tickets.
arXiv Detail & Related papers (2023-12-13T03:31:19Z) - Back to Basics: Fast Denoising Iterative Algorithm [0.0]
We introduce Back to Basics (BTB), a fast iterative algorithm for noise reduction.
We examine three study cases: natural image denoising in the presence of additive white Gaussian noise, Poisson-distributed image denoising, and speckle suppression in optical coherence tomography ( OCT)
Experimental results demonstrate that the proposed approach can effectively improve image quality, in challenging noise settings.
arXiv Detail & Related papers (2023-11-11T18:32:06Z) - Score Priors Guided Deep Variational Inference for Unsupervised
Real-World Single Image Denoising [14.486289176696438]
We propose a score priors-guided deep variational inference, namely ScoreDVI, for practical real-world denoising.
We exploit a Non-$i.i.d$ Gaussian mixture model and variational noise posterior to model the real-world noise.
Our method outperforms other single image-based real-world denoising methods and achieves comparable performance to dataset-based unsupervised methods.
arXiv Detail & Related papers (2023-08-09T03:26:58Z) - Representing Noisy Image Without Denoising [91.73819173191076]
Fractional-order Moments in Radon space (FMR) is designed to derive robust representation directly from noisy images.
Unlike earlier integer-order methods, our work is a more generic design taking such classical methods as special cases.
arXiv Detail & Related papers (2023-01-18T10:13:29Z) - NLIP: Noise-robust Language-Image Pre-training [95.13287735264937]
We propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion.
Our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way.
arXiv Detail & Related papers (2022-12-14T08:19:30Z) - Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware
Adversarial Training [50.018580462619425]
We propose a novel framework, namely Pixel-level Noise-aware Generative Adrial Network (PNGAN)
PNGAN employs a pre-trained real denoiser to map the fake and real noisy images into a nearly noise-free solution space.
For better noise fitting, we present an efficient architecture Simple Multi-versa-scale Network (SMNet) as the generator.
arXiv Detail & Related papers (2022-04-06T14:09:02Z) - Dual Adversarial Network: Toward Real-world Noise Removal and Noise
Generation [52.75909685172843]
Real-world image noise removal is a long-standing yet very challenging task in computer vision.
We propose a novel unified framework to deal with the noise removal and noise generation tasks.
Our method learns the joint distribution of the clean-noisy image pairs.
arXiv Detail & Related papers (2020-07-12T09:16:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.