Fine-Tuning Diffusion Generative Models via Rich Preference Optimization
- URL: http://arxiv.org/abs/2503.11720v3
- Date: Wed, 16 Apr 2025 15:28:55 GMT
- Title: Fine-Tuning Diffusion Generative Models via Rich Preference Optimization
- Authors: Hanyang Zhao, Haoxian Chen, Yucheng Guo, Genta Indra Winata, Tingting Ou, Ziyu Huang, David D. Yao, Wenpin Tang,
- Abstract summary: We introduce Rich Preference Optimization (RPO), a novel pipeline to improve the curation of preference pairs for fine-tuning text-to-image diffusion models.<n>RPO generates detailed critiques of synthesized images to extract reliable and actionable image editing instructions.<n>We demonstrate the effectiveness of our pipeline in fine-tuning state-of-the-art diffusion models.
- Score: 13.4078883626321
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Rich Preference Optimization (RPO), a novel pipeline that leverages rich feedback signals to improve the curation of preference pairs for fine-tuning text-to-image diffusion models. Traditional methods, like Diffusion-DPO, often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as reward hacking or overfitting. In contrast, our approach begins with generating detailed critiques of synthesized images to extract reliable and actionable image editing instructions. By implementing these instructions, we create refined images, resulting in synthetic, informative preference pairs that serve as enhanced tuning datasets. We demonstrate the effectiveness of our pipeline and the resulting datasets in fine-tuning state-of-the-art diffusion models.
Related papers
- A Simple Combination of Diffusion Models for Better Quality Trade-Offs in Image Denoising [43.44633086975204]
We propose an intuitive method for leveraging pretrained diffusion models.
We then introduce our proposed Linear Combination Diffusion Denoiser.
LCDD achieves state-of-the-art performance and offers controlled, well-behaved trade-offs.
arXiv Detail & Related papers (2025-03-18T19:02:19Z) - Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking [50.325021634589596]
We propose a Tailored Optimization Preference (TailorPO) framework for aligning diffusion models with human preference.<n>Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues.<n> Experimental results demonstrate that our method significantly improves the model's ability to generate aesthetically pleasing and human-preferred images.
arXiv Detail & Related papers (2025-02-01T16:08:43Z) - PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference [62.72779589895124]
We make the first attempt to align diffusion models for image inpainting with human aesthetic standards via a reinforcement learning framework.
We train a reward model with a dataset we construct, consisting of nearly 51,000 images annotated with human preferences.
Experiments on inpainting comparison and downstream tasks, such as image extension and 3D reconstruction, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-29T11:49:39Z) - Scalable Ranked Preference Optimization for Text-to-Image Generation [76.16285931871948]
We investigate a scalable approach for collecting large-scale and fully synthetic datasets for DPO training.
The preferences for paired images are generated using a pre-trained reward function, eliminating the need for involving humans in the annotation process.
We introduce RankDPO to enhance DPO-based methods using the ranking feedback.
arXiv Detail & Related papers (2024-10-23T16:42:56Z) - Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization [97.35427957922714]
We present an algorithm named pairwise sample optimization (PSO), which enables the direct fine-tuning of an arbitrary timestep-distilled diffusion model.<n>PSO introduces additional reference images sampled from the current time-step distilled model, and increases the relative likelihood margin between the training images and reference images.<n>We show that PSO can directly adapt distilled models to human-preferred generation with both offline and online-generated pairwise preference image data.
arXiv Detail & Related papers (2024-10-04T07:05:16Z) - Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization [20.698818784349015]
This paper introduces step-by-step preference optimization (SPO) to improve aesthetics economically.
SPO discards the propagation strategy and allows fine-grained image details to be assessed.
SPO converges much faster than DPO methods due to the use of more correct preference labels provided by the step-aware preference model.
arXiv Detail & Related papers (2024-06-06T17:57:09Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - ExposureDiffusion: Learning to Expose for Low-light Image Enhancement [87.08496758469835]
This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure model.
Our method obtains significantly improved performance and reduced inference time compared with vanilla diffusion models.
The proposed framework can work with both real-paired datasets, SOTA noise models, and different backbone networks.
arXiv Detail & Related papers (2023-07-15T04:48:35Z) - NaturalInversion: Data-Free Image Synthesis Improving Real-World
Consistency [1.1470070927586016]
We introduce NaturalInversion, a novel model inversion-based method to synthesize images that agrees well with the original data distribution without using real data.
We show that our images are more consistent with original data distribution than prior works by visualization and additional analysis.
arXiv Detail & Related papers (2023-06-29T03:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.