Related papers: It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models

It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models

URL: http://arxiv.org/abs/2601.00090v1
Date: Wed, 31 Dec 2025 19:47:49 GMT
Title: It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models
Authors: Anne Harrington, A. Sophia Koepke, Shyamgopal Karthik, Trevor Darrell, Alexei A. Efros,
Abstract summary: We show that a simple noise optimization objective can mitigate mode collapse while preserving the fidelity of the base model.<n>Our experiments demonstrate that noise optimization yields superior results in terms of generation quality and variety.
Score: 80.53672733210111
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contemporary text-to-image models exhibit a surprising degree of mode collapse, as can be seen when sampling several images given the same text prompt. While previous work has attempted to address this issue by steering the model using guidance mechanisms, or by generating a large pool of candidates and refining them, in this work we take a different direction and aim for diversity in generations via noise optimization. Specifically, we show that a simple noise optimization objective can mitigate mode collapse while preserving the fidelity of the base model. We also analyze the frequency characteristics of the noise and show that alternative noise initializations with different frequency profiles can improve both optimization and search. Our experiments demonstrate that noise optimization yields superior results in terms of generation quality and variety.

Related papers

Noise Projection: Closing the Prompt-Agnostic Gap Behind Text-to-Image Misalignment in Diffusion Models [9.683618735282414]
In text-to-image generation, different initial noises induce distinct denoising paths with a pretrained Stable Diffusion (SD) model.<n>While this pattern could output diverse images, some of them may fail to align well with the prompt.<n>We propose a noise projector that applies text-conditioned refinement to the initial noise before denoising.
arXiv Detail & Related papers (2025-10-16T10:14:34Z)
Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance [54.88271057438763]
Noise Awareness Guidance (NAG) is a correction method that explicitly steers sampling trajectories to remain consistent with the pre-defined noise schedule.<n>NAG consistently mitigates noise shift and substantially improves the generation quality of mainstream diffusion models.
arXiv Detail & Related papers (2025-10-14T13:31:34Z)
Noise Conditional Variational Score Distillation [60.38982038894823]
Noise Conditional Variational Score Distillation (NCVSD) is a novel method for distilling pretrained diffusion models into generative denoisers.<n>By integrating this insight into the Variational Score Distillation framework, we enable scalable learning of generative denoisers.
arXiv Detail & Related papers (2025-06-11T06:01:39Z)
A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models [3.8623569699070357]
Noise PPO is a minimalist reinforcement learning algorithm that learns a prompt-conditioned initial noise generator.<n>Experiments show that Noise PPO consistently improves alignment and sample quality over the original model.<n>These findings reinforce the practical value of minimalist RL fine-tuning for diffusion models.
arXiv Detail & Related papers (2025-05-23T00:01:52Z)
ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos [41.45750971432533]
Video diffusion models (VDMs) facilitate the generation of high-quality videos.<n>Recent studies have uncovered the existence of "golden noises" that can enhance video quality during generation.<n>We propose ScalingNoise, a plug-and-play inference-time search strategy that identifies golden initial noises for the diffusion sampling process.
arXiv Detail & Related papers (2025-03-20T17:54:37Z)
Beyond Image Prior: Embedding Noise Prior into Conditional Denoising Transformer [17.430622649002427]
Existing learning-based denoising methods typically train models to generalize the image prior from large-scale datasets.<n>We propose a new perspective on the denoising challenge by highlighting the distinct separation between noise and image priors.<n>We introduce a Locally Noise Prior Estimation algorithm, which accurately estimates the noise prior directly from a single raw noisy image.
arXiv Detail & Related papers (2024-07-12T08:43:11Z)
One Noise to Rule Them All: Learning a Unified Model of Spatially-Varying Noise Patterns [33.293193191683145]
We present a single generative model which can learn to generate multiple types of noise as well as blend between them. We also present an application of our model to improving inverse procedural material design.
arXiv Detail & Related papers (2024-04-25T02:23:11Z)
Blue noise for diffusion models [50.99852321110366]
We introduce a novel and general class of diffusion models taking correlated noise within and across images into account. Our framework allows introducing correlation across images within a single mini-batch to improve gradient flow. We perform both qualitative and quantitative evaluations on a variety of datasets using our method.
arXiv Detail & Related papers (2024-02-07T14:59:25Z)
Dynamic Dual-Output Diffusion Models [100.32273175423146]
Iterative denoising-based generation has been shown to be comparable in quality to other classes of generative models. A major drawback of this method is that it requires hundreds of iterations to produce a competitive result. Recent works have proposed solutions that allow for faster generation with fewer iterations, but the image quality gradually deteriorates.
arXiv Detail & Related papers (2022-03-08T11:20:40Z)
A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals. The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.