Aligning Diffusion Models with Noise-Conditioned Perception
- URL: http://arxiv.org/abs/2406.17636v1
- Date: Tue, 25 Jun 2024 15:21:50 GMT
- Title: Aligning Diffusion Models with Noise-Conditioned Perception
- Authors: Alexander Gambashidze, Anton Kulikov, Yuriy Sosnin, Ilya Makarov,
- Abstract summary: Diffusion Models typically optimize in pixel or VAE space, which does not align well with human perception.
We propose using a perceptual objective in the U-Net embedding space of the diffusion model to address these issues.
- Score: 42.042822966928576
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in human preference optimization, initially developed for Language Models (LMs), have shown promise for text-to-image Diffusion Models, enhancing prompt alignment, visual appeal, and user preference. Unlike LMs, Diffusion Models typically optimize in pixel or VAE space, which does not align well with human perception, leading to slower and less efficient training during the preference alignment stage. We propose using a perceptual objective in the U-Net embedding space of the diffusion model to address these issues. Our approach involves fine-tuning Stable Diffusion 1.5 and XL using Direct Preference Optimization (DPO), Contrastive Preference Optimization (CPO), and supervised fine-tuning (SFT) within this embedding space. This method significantly outperforms standard latent-space implementations across various metrics, including quality and computational cost. For SDXL, our approach provides 60.8\% general preference, 62.2\% visual appeal, and 52.1\% prompt following against original open-sourced SDXL-DPO on the PartiPrompts dataset, while significantly reducing compute. Our approach not only improves the efficiency and quality of human preference alignment for diffusion models but is also easily integrable with other optimization techniques. The training code and LoRA weights will be available here: https://huggingface.co/alexgambashidze/SDXL\_NCP-DPO\_v0.1
Related papers
- Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization [46.888425016169144]
Preference optimization for diffusion models aims to align them with human preferences for images.
Previous methods typically leverage Vision-Language Models (VLMs) as pixel-level reward models to approximate human preferences.
In this work, we demonstrate that diffusion models are inherently well-suited for step-level reward modeling in the latent space.
We introduce Latent Preference Optimization (LPO), a method designed for step-level preference optimization directly in the latent space.
arXiv Detail & Related papers (2025-02-03T04:51:28Z) - Personalized Preference Fine-tuning of Diffusion Models [75.22218338096316]
We introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences.
With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way.
Our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
arXiv Detail & Related papers (2025-01-11T22:38:41Z) - FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models [10.969811500333755]
We introduce a Fine-tuning Initial Noise Distribution (FIND) framework with policy optimization.
Our method achieves 10 times faster than the SOTA approach.
arXiv Detail & Related papers (2024-07-28T10:07:55Z) - Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization [68.69203905664524]
We introduce Diffusion-RPO, a new method designed to align diffusion-based T2I models with human preferences more effectively.
We have developed a new evaluation metric, style alignment, aimed at overcoming the challenges of high costs, low interpretability.
Our findings demonstrate that Diffusion-RPO outperforms established methods such as Supervised Fine-Tuning and Diffusion-DPO in tuning Stable Diffusion versions 1.5 and XL-1.0.
arXiv Detail & Related papers (2024-06-10T15:42:03Z) - Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data.
We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice.
We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z) - Diffusion Model Alignment Using Direct Preference Optimization [103.2238655827797]
Diffusion-DPO is a method to align diffusion models to human preferences by directly optimizing on human comparison data.
We fine-tune the base model of the state-of-the-art Stable Diffusion XL (SDXL)-1.0 model with Diffusion-DPO.
We also develop a variant that uses AI feedback and has comparable performance to training on human preferences.
arXiv Detail & Related papers (2023-11-21T15:24:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.