Related papers: The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation

The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation

URL: http://arxiv.org/abs/2412.05101v1
Date: Fri, 06 Dec 2024 14:59:00 GMT
Title: The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation
Authors: Ruoyu Wang, Huayang Huang, Ye Zhu, Olga Russakovsky, Yu Wu,
Abstract summary: Text-to-image synthesis (T2I) has advanced remarkably with the emergence of large-scale diffusion models. In this work, we reveal that the often-overlooked noise itself encodes inherent generative tendencies, acting as a "silent prompt" that implicitly guides the output. We introduce NoiseQuery, a novel strategy that selects optimal initial noise from a pre-built noise library to meet diverse user needs.
Score: 31.599902235859687
License:
Abstract: Text-to-image synthesis (T2I) has advanced remarkably with the emergence of large-scale diffusion models. In the conventional setup, the text prompt provides explicit, user-defined guidance, directing the generation process by denoising a randomly sampled Gaussian noise. In this work, we reveal that the often-overlooked noise itself encodes inherent generative tendencies, acting as a "silent prompt" that implicitly guides the output. This implicit guidance, embedded in the noise scheduler design of diffusion model formulations and their training stages, generalizes across a wide range of T2I models and backbones. Building on this insight, we introduce NoiseQuery, a novel strategy that selects optimal initial noise from a pre-built noise library to meet diverse user needs. Our approach not only enhances high-level semantic alignment with text prompts, but also allows for nuanced adjustments of low-level visual attributes, such as texture, sharpness, shape, and color, which are typically challenging to control through text alone. Extensive experiments across various models and target attributes demonstrate the strong performance and zero-shot transferability of our approach, requiring no additional optimization.

Related papers

Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis [9.11767497956649]
This paper proposes leveraging the language comprehension capabilities of large vision-language models to guide the optimization of the initial noisy latent. We introduce the Noise Diffusion process, which updates the noisy latent to generate semantically faithful images while preserving distribution consistency. Experimental results demonstrate the effectiveness and adaptability of our framework, consistently enhancing semantic alignment across various diffusion models.
arXiv Detail & Related papers (2024-11-25T15:40:47Z)
Golden Noise for Diffusion Models: A Learning Framework [26.117889730713923]
Text-to-image diffusion model is a popular paradigm that synthesizes personalized images by providing a text prompt and a random Gaussian noise. While people observe that some noises are golden noises'' that can achieve better text-image alignment and higher human preference than others, we still lack a machine learning framework to obtain those golden noises.
arXiv Detail & Related papers (2024-11-14T15:13:13Z)
Beyond Image Prior: Embedding Noise Prior into Conditional Denoising Transformer [17.430622649002427]
Existing learning-based denoising methods typically train models to generalize the image prior from large-scale datasets. We propose a new perspective on the denoising challenge by highlighting the distinct separation between noise and image priors. We introduce a Locally Noise Prior Estimation algorithm, which accurately estimates the noise prior directly from a single raw noisy image.
arXiv Detail & Related papers (2024-07-12T08:43:11Z)
InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization [27.508861002013358]
InitNO is a paradigm that refines the initial noise in semantically-faithful images. A strategically crafted noise optimization pipeline is developed to guide the initial noise towards valid regions. Our method, validated through rigorous experimentation, shows a commendable proficiency in generating images in strict accordance with text prompts.
arXiv Detail & Related papers (2024-04-06T14:56:59Z)
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations. Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z)
Advancing Unsupervised Low-light Image Enhancement: Noise Estimation, Illumination Interpolation, and Self-Regulation [55.07472635587852]
Low-Light Image Enhancement (LLIE) techniques have made notable advancements in preserving image details and enhancing contrast. These approaches encounter persistent challenges in efficiently mitigating dynamic noise and accommodating diverse low-light scenarios. We first propose a method for estimating the noise level in low light images in a quick and accurate way. We then devise a Learnable Illumination Interpolator (LII) to satisfy general constraints between illumination and input.
arXiv Detail & Related papers (2023-05-17T13:56:48Z)
Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs) GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations. We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z)
NLIP: Noise-robust Language-Image Pre-training [95.13287735264937]
We propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion. Our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way.
arXiv Detail & Related papers (2022-12-14T08:19:30Z)
Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation [52.75909685172843]
Real-world image noise removal is a long-standing yet very challenging task in computer vision. We propose a novel unified framework to deal with the noise removal and noise generation tasks. Our method learns the joint distribution of the clean-noisy image pairs.
arXiv Detail & Related papers (2020-07-12T09:16:06Z)
Variational Denoising Network: Toward Blind Noise Modeling and Removal [59.36166491196973]
Blind image denoising is an important yet very challenging problem in computer vision. We propose a new variational inference method, which integrates both noise estimation and image denoising.
arXiv Detail & Related papers (2019-08-29T15:54:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.