A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2506.12036v3
- Date: Tue, 01 Jul 2025 05:46:31 GMT
- Title: A Minimalist Method for Fine-tuning Text-to-Image Diffusion Models
- Authors: Yanting Miao, William Loh, Pacal Poupart, Suraj Kothawade,
- Abstract summary: Noise PPO is a minimalist reinforcement learning algorithm that learns a prompt-conditioned initial noise generator.<n>Experiments show that Noise PPO consistently improves alignment and sample quality over the original model.<n>These findings reinforce the practical value of minimalist RL fine-tuning for diffusion models.
- Score: 3.8623569699070357
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work uses reinforcement learning (RL) to fine-tune text-to-image diffusion models, improving text-image alignment and sample quality. However, existing approaches introduce unnecessary complexity: they cache the full sampling trajectory, depend on differentiable reward models or large preference datasets, or require specialized guidance techniques. Motivated by the "golden noise" hypothesis -- that certain initial noise samples can consistently yield superior alignment -- we introduce Noise PPO, a minimalist RL algorithm that leaves the pre-trained diffusion model entirely frozen and learns a prompt-conditioned initial noise generator. Our approach requires no trajectory storage, reward backpropagation, or complex guidance tricks. Extensive experiments show that optimizing the initial noise distribution consistently improves alignment and sample quality over the original model, with the most significant gains at low inference steps. As the number of inference steps increases, the benefit of noise optimization diminishes but remains present. These findings clarify the scope and limitations of the golden noise hypothesis and reinforce the practical value of minimalist RL fine-tuning for diffusion models.
Related papers
- Noise Conditional Variational Score Distillation [60.38982038894823]
Noise Conditional Variational Score Distillation (NCVSD) is a novel method for distilling pretrained diffusion models into generative denoisers.<n>By integrating this insight into the Variational Score Distillation framework, we enable scalable learning of generative denoisers.
arXiv Detail & Related papers (2025-06-11T06:01:39Z) - Optimizing for the Shortest Path in Denoising Diffusion Model [8.884907787678731]
Shortest Path Diffusion Model (ShortDF) treats the denoising process as a shortest-path problem aimed at minimizing reconstruction error.<n>Experiments on multiple standard benchmarks demonstrate that ShortDF significantly reduces diffusion time (or steps)<n>This work, we suppose, paves the way for interactive diffusion-based applications and establishes a foundation for rapid data generation.
arXiv Detail & Related papers (2025-03-05T08:47:36Z) - Beyond Fixed Horizons: A Theoretical Framework for Adaptive Denoising Diffusions [1.9116784879310031]
We introduce a new class of generative diffusion models that achieve a time-homogeneous structure for both the noising and denoising processes.<n>A key feature of the model is its adaptability to the target data, enabling a variety of downstream tasks using a pre-trained unconditional generative model.
arXiv Detail & Related papers (2025-01-31T18:23:27Z) - Arbitrary-steps Image Super-resolution via Diffusion Inversion [68.78628844966019]
This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance.<n>We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point.<n>Once trained, this noise predictor can be used to initialize the sampling process partially along the diffusion trajectory, generating the desirable high-resolution result.
arXiv Detail & Related papers (2024-12-12T07:24:13Z) - Training-free Diffusion Model Alignment with Sampling Demons [15.400553977713914]
We propose an optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining.<n>Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through optimization.<n>Our experiments show that the proposed approach significantly improves the average aesthetics scores text-to-image generation.
arXiv Detail & Related papers (2024-10-08T07:33:49Z) - Improved Noise Schedule for Diffusion Training [51.849746576387375]
We propose a novel approach to design the noise schedule for enhancing the training of diffusion models.<n>We empirically demonstrate the superiority of our noise schedule over the standard cosine schedule.
arXiv Detail & Related papers (2024-07-03T17:34:55Z) - ROPO: Robust Preference Optimization for Large Language Models [59.10763211091664]
We propose an iterative alignment approach that integrates noise-tolerance and filtering of noisy samples without the aid of external models.
Experiments on three widely-used datasets with Mistral-7B and Llama-2-7B demonstrate that ROPO significantly outperforms existing preference alignment methods.
arXiv Detail & Related papers (2024-04-05T13:58:51Z) - Model-Agnostic Human Preference Inversion in Diffusion Models [31.992947353231564]
We propose a novel sampling design to achieve high-quality one-step image generation aligning with human preferences.
Our approach, Prompt Adaptive Human Preference Inversion (PAHI), optimize the noise distributions for each prompt based on human preferences.
Our experiments showcase that the tailored noise distributions significantly improve image quality with only a marginal increase in computational cost.
arXiv Detail & Related papers (2024-04-01T03:18:12Z) - Blue noise for diffusion models [50.99852321110366]
We introduce a novel and general class of diffusion models taking correlated noise within and across images into account.
Our framework allows introducing correlation across images within a single mini-batch to improve gradient flow.
We perform both qualitative and quantitative evaluations on a variety of datasets using our method.
arXiv Detail & Related papers (2024-02-07T14:59:25Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.