Golden Noise for Diffusion Models: A Learning Framework
- URL: http://arxiv.org/abs/2411.09502v4
- Date: Sat, 18 Jan 2025 04:40:28 GMT
- Title: Golden Noise for Diffusion Models: A Learning Framework
- Authors: Zikai Zhou, Shitong Shao, Lichen Bai, Zhiqiang Xu, Bo Han, Zeke Xie,
- Abstract summary: Text-to-image diffusion model is a popular paradigm that synthesizes personalized images by providing a text prompt and a random Gaussian noise.
While people observe that some noises are golden noises'' that can achieve better text-image alignment and higher human preference than others, we still lack a machine learning framework to obtain those golden noises.
- Score: 26.117889730713923
- License:
- Abstract: Text-to-image diffusion model is a popular paradigm that synthesizes personalized images by providing a text prompt and a random Gaussian noise. While people observe that some noises are ``golden noises'' that can achieve better text-image alignment and higher human preference than others, we still lack a machine learning framework to obtain those golden noises. To learn golden noises for diffusion sampling, we mainly make three contributions in this paper. First, we identify a new concept termed the \textit{noise prompt}, which aims at turning a random Gaussian noise into a golden noise by adding a small desirable perturbation derived from the text prompt. Following the concept, we first formulate the \textit{noise prompt learning} framework that systematically learns ``prompted'' golden noise associated with a text prompt for diffusion models. Second, we design a noise prompt data collection pipeline and collect a large-scale \textit{noise prompt dataset}~(NPD) that contains 100k pairs of random noises and golden noises with the associated text prompts. With the prepared NPD as the training dataset, we trained a small \textit{noise prompt network}~(NPNet) that can directly learn to transform a random noise into a golden noise. The learned golden noise perturbation can be considered as a kind of prompt for noise, as it is rich in semantic information and tailored to the given text prompt. Third, our extensive experiments demonstrate the impressive effectiveness and generalization of NPNet on improving the quality of synthesized images across various diffusion models, including SDXL, DreamShaper-xl-v2-turbo, and Hunyuan-DiT. Moreover, NPNet is a small and efficient controller that acts as a plug-and-play module with very limited additional inference and computational costs, as it just provides a golden noise instead of a random noise without accessing the original pipeline.
Related papers
- The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation [31.599902235859687]
Text-to-image synthesis (T2I) has advanced remarkably with the emergence of large-scale diffusion models.
In this work, we reveal that the often-overlooked noise itself encodes inherent generative tendencies, acting as a "silent prompt" that implicitly guides the output.
We introduce NoiseQuery, a novel strategy that selects optimal initial noise from a pre-built noise library to meet diverse user needs.
arXiv Detail & Related papers (2024-12-06T14:59:00Z) - A Noise is Worth Diffusion Guidance [36.912490607355295]
Current diffusion models struggle to produce reliable images without guidance.
We propose a novel method that replaces guidance methods with a single refinement of the initial noise.
Our noise-refining model leverages efficient noise-space learning, achieving rapid convergence and strong performance with just 50K text-image pairs.
arXiv Detail & Related papers (2024-12-05T06:09:56Z) - One Noise to Rule Them All: Learning a Unified Model of Spatially-Varying Noise Patterns [33.293193191683145]
We present a single generative model which can learn to generate multiple types of noise as well as blend between them.
We also present an application of our model to improving inverse procedural material design.
arXiv Detail & Related papers (2024-04-25T02:23:11Z) - A Generative Model for Digital Camera Noise Synthesis [12.236112464800403]
We propose an effective generative model which utilizes clean features as guidance followed by noise injections into the network.
Specifically, our generator follows a UNet-like structure with skip connections but without downsampling and upsampling layers.
We show that our proposed approach outperforms existing methods for synthesizing camera noise.
arXiv Detail & Related papers (2023-03-16T10:17:33Z) - NLIP: Noise-robust Language-Image Pre-training [95.13287735264937]
We propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion.
Our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way.
arXiv Detail & Related papers (2022-12-14T08:19:30Z) - Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware
Adversarial Training [50.018580462619425]
We propose a novel framework, namely Pixel-level Noise-aware Generative Adrial Network (PNGAN)
PNGAN employs a pre-trained real denoiser to map the fake and real noisy images into a nearly noise-free solution space.
For better noise fitting, we present an efficient architecture Simple Multi-versa-scale Network (SMNet) as the generator.
arXiv Detail & Related papers (2022-04-06T14:09:02Z) - C2N: Practical Generative Noise Modeling for Real-World Denoising [53.96391787869974]
We introduce a Clean-to-Noisy image generation framework, namely C2N, to imitate complex real-world noise without using paired examples.
We construct the noise generator in C2N accordingly with each component of real-world noise characteristics to express a wide range of noise accurately.
arXiv Detail & Related papers (2022-02-19T05:53:46Z) - Neighbor2Neighbor: Self-Supervised Denoising from Single Noisy Images [98.82804259905478]
We present Neighbor2Neighbor to train an effective image denoising model with only noisy images.
In detail, input and target used to train a network are images sub-sampled from the same noisy image.
A denoising network is trained on sub-sampled training pairs generated in the first stage, with a proposed regularizer as additional loss for better performance.
arXiv Detail & Related papers (2021-01-08T02:03:25Z) - Adaptive noise imitation for image denoising [58.21456707617451]
We develop a new textbfadaptive noise imitation (ADANI) algorithm that can synthesize noisy data from naturally noisy images.
To produce realistic noise, a noise generator takes unpaired noisy/clean images as input, where the noisy image is a guide for noise generation.
Coupling the noisy data output from ADANI with the corresponding ground-truth, a denoising CNN is then trained in a fully-supervised manner.
arXiv Detail & Related papers (2020-11-30T02:49:36Z) - Dual Adversarial Network: Toward Real-world Noise Removal and Noise
Generation [52.75909685172843]
Real-world image noise removal is a long-standing yet very challenging task in computer vision.
We propose a novel unified framework to deal with the noise removal and noise generation tasks.
Our method learns the joint distribution of the clean-noisy image pairs.
arXiv Detail & Related papers (2020-07-12T09:16:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.