Beyond Randomness: Understand the Order of the Noise in Diffusion
- URL: http://arxiv.org/abs/2511.07756v1
- Date: Wed, 12 Nov 2025 01:15:12 GMT
- Title: Beyond Randomness: Understand the Order of the Noise in Diffusion
- Authors: Song Yan, Min Li, Bi Xinliang, Jian Yang, Yusen Zhang, Guanye Xiong, Yunwei Lan, Tao Zhang, Wei Zhai, Zheng-Jun Zha,
- Abstract summary: In text-driven content generation (T2C) diffusion model, semantic of generated content is mostly attributed to the process of text embedding and attention mechanism interaction.<n>This paper conducts a comprehensive analysis of the impact of random noise on the model's generation.<n>We propose a simple but efficient training-free and universal two-step "Semantic Erasure-Injection" process to modulate the initial noise in T2C diffusion model.
- Score: 60.3872274940353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In text-driven content generation (T2C) diffusion model, semantic of generated content is mostly attributed to the process of text embedding and attention mechanism interaction. The initial noise of the generation process is typically characterized as a random element that contributes to the diversity of the generated content. Contrary to this view, this paper reveals that beneath the random surface of noise lies strong analyzable patterns. Specifically, this paper first conducts a comprehensive analysis of the impact of random noise on the model's generation. We found that noise not only contains rich semantic information, but also allows for the erasure of unwanted semantics from it in an extremely simple way based on information theory, and using the equivalence between the generation process of diffusion model and semantic injection to inject semantics into the cleaned noise. Then, we mathematically decipher these observations and propose a simple but efficient training-free and universal two-step "Semantic Erasure-Injection" process to modulate the initial noise in T2C diffusion model. Experimental results demonstrate that our method is consistently effective across various T2C models based on both DiT and UNet architectures and presents a novel perspective for optimizing the generation of diffusion model, providing a universal tool for consistent generation.
Related papers
- Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding Perturbation [26.72622200307507]
Test-time scaling (TTS) aims to achieve better results by increasing random sampling and evaluating samples based on rules and metrics.<n>In this work, we analyze the effects of randomness in T2I diffusion models and explore a new format of randomness for TTS: text embedding perturbation.
arXiv Detail & Related papers (2025-12-03T17:27:53Z) - Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance [54.88271057438763]
Noise Awareness Guidance (NAG) is a correction method that explicitly steers sampling trajectories to remain consistent with the pre-defined noise schedule.<n>NAG consistently mitigates noise shift and substantially improves the generation quality of mainstream diffusion models.
arXiv Detail & Related papers (2025-10-14T13:31:34Z) - The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation [31.599902235859687]
We propose to leverage an aligned Gaussian noise as implicit guidance to complement explicit user-defined inputs, such as text prompts.<n>NoiseQuery enables fine-grained control and yields significant performance boosts over high-level semantics and over low-level visual attributes.
arXiv Detail & Related papers (2024-12-06T14:59:00Z) - GUD: Generation with Unified Diffusion [40.64742332352373]
Diffusion generative models transform noise into data by inverting a process that progressively adds noise to data samples.
We develop a unified framework for diffusion generative models with greatly enhanced design freedom.
arXiv Detail & Related papers (2024-10-03T16:51:14Z) - DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval [49.076590578101985]
We present a diffusion-based ATR framework (DiffATR) that generates joint distribution from noise.
Experiments on the AudioCaps and Clotho datasets with superior performances, verify the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-16T06:33:26Z) - One Noise to Rule Them All: Learning a Unified Model of Spatially-Varying Noise Patterns [33.293193191683145]
We present a single generative model which can learn to generate multiple types of noise as well as blend between them.
We also present an application of our model to improving inverse procedural material design.
arXiv Detail & Related papers (2024-04-25T02:23:11Z) - InfoDiffusion: Information Entropy Aware Diffusion Process for
Non-Autoregressive Text Generation [33.52794666968048]
We propose InfoDiffusion, a non-autoregressive text diffusion model.
Our approach introduces a "keyinfo-first" generation strategy and incorporates a noise schedule based on the amount of text information.
Experimental results show that InfoDiffusion outperforms the baseline model in terms of generation quality and diversity.
arXiv Detail & Related papers (2023-10-18T14:01:39Z) - Adversarial Training of Denoising Diffusion Model Using Dual
Discriminators for High-Fidelity Multi-Speaker TTS [0.0]
The diffusion model is capable of generating high-quality data through a probabilistic approach.
It suffers from the drawback of slow generation speed due to the requirement of a large number of time steps.
We propose a speech synthesis model with two discriminators: a diffusion discriminator for learning the distribution of the reverse process and a spectrogram discriminator for learning the distribution of the generated data.
arXiv Detail & Related papers (2023-08-03T07:22:04Z) - An Efficient Membership Inference Attack for the Diffusion Model by
Proximal Initialization [58.88327181933151]
In this paper, we propose an efficient query-based membership inference attack (MIA)
Experimental results indicate that the proposed method can achieve competitive performance with only two queries on both discrete-time and continuous-time diffusion models.
To the best of our knowledge, this work is the first to study the robustness of diffusion models to MIA in the text-to-speech task.
arXiv Detail & Related papers (2023-05-26T16:38:48Z) - VideoFusion: Decomposed Diffusion Models for High-Quality Video
Generation [88.49030739715701]
This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis.
Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation.
arXiv Detail & Related papers (2023-03-15T02:16:39Z) - SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers [50.90457644954857]
In this work, we apply diffusion models to approach sequence-to-sequence text generation.
We propose SeqDiffuSeq, a text diffusion model for sequence-to-sequence generation.
Experiment results illustrate the good performance on sequence-to-sequence generation in terms of text quality and inference time.
arXiv Detail & Related papers (2022-12-20T15:16:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.