Towards Better Optimization For Listwise Preference in Diffusion Models
- URL: http://arxiv.org/abs/2510.01540v1
- Date: Thu, 02 Oct 2025 00:26:37 GMT
- Title: Towards Better Optimization For Listwise Preference in Diffusion Models
- Authors: Jiamu Bai, Xin Yu, Meilong Xu, Weitao Lu, Xin Pan, Kiwan Maeng, Daniel Kifer, Jian Wang, Yu Wang,
- Abstract summary: We propose Diffusion-LPO, a framework for Listwise Preference Optimization in diffusion models with listwise data.<n>Given a caption, we aggregate user feedback into a ranked list of images and derive a listwise extension of the DPO objective under the Plackett-Luce model.<n>We empirically demonstrate the effectiveness of Diffusion-LPO across various tasks, including text-to-image generation, image editing, and personalized preference alignment.
- Score: 19.40269067848114
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reinforcement learning from human feedback (RLHF) has proven effectiveness for aligning text-to-image (T2I) diffusion models with human preferences. Although Direct Preference Optimization (DPO) is widely adopted for its computational efficiency and avoidance of explicit reward modeling, its applications to diffusion models have primarily relied on pairwise preferences. The precise optimization of listwise preferences remains largely unaddressed. In practice, human feedback on image preferences often contains implicit ranked information, which conveys more precise human preferences than pairwise comparisons. In this work, we propose Diffusion-LPO, a simple and effective framework for Listwise Preference Optimization in diffusion models with listwise data. Given a caption, we aggregate user feedback into a ranked list of images and derive a listwise extension of the DPO objective under the Plackett-Luce model. Diffusion-LPO enforces consistency across the entire ranking by encouraging each sample to be preferred over all of its lower-ranked alternatives. We empirically demonstrate the effectiveness of Diffusion-LPO across various tasks, including text-to-image generation, image editing, and personalized preference alignment. Diffusion-LPO consistently outperforms pairwise DPO baselines on visual quality and preference alignment.
Related papers
- Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences [13.588231827053923]
Direct Preference Optimization (DPO) aligns text-to-image (T2I) generation models with human preferences using pairwise preference data.<n>We propose SmPO-Diffusion, a novel method for modeling preference distributions to improve the DPO objective.<n>Our approach effectively mitigates issues of excessive optimization and objective misalignment present in existing methods.
arXiv Detail & Related papers (2025-06-03T09:47:22Z) - Self-NPO: Negative Preference Optimization of Diffusion Models by Simply Learning from Itself without Explicit Preference Annotations [60.143658714894336]
Diffusion models have demonstrated remarkable success in various visual generation tasks, including image, video, and 3D content generation.<n> Preference optimization (PO) is a prominent and growing area of research that aims to align these models with human preferences.<n>We introduce Self-NPO, a Negative Preference Optimization approach that learns exclusively from the model itself.
arXiv Detail & Related papers (2025-05-17T01:03:46Z) - Dual Caption Preference Optimization for Diffusion Models [51.223275938663235]
We propose Dual Caption Preference Optimization (DCPO), a novel approach that utilizes two distinct captions to mitigate irrelevant prompts.<n>Our experiments show that DCPO significantly improves image quality and relevance to prompts, outperforming Stable Diffusion (SD) 2.1, SFT_Chosen, Diffusion-DPO, and MaPO across multiple metrics.
arXiv Detail & Related papers (2025-02-09T20:34:43Z) - Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking [50.325021634589596]
We propose a Tailored Optimization Preference (TailorPO) framework for aligning diffusion models with human preference.<n>Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues.<n> Experimental results demonstrate that our method significantly improves the model's ability to generate aesthetically pleasing and human-preferred images.
arXiv Detail & Related papers (2025-02-01T16:08:43Z) - Personalized Preference Fine-tuning of Diffusion Models [75.22218338096316]
We introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences.<n>With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way.<n>Our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
arXiv Detail & Related papers (2025-01-11T22:38:41Z) - Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization [68.69203905664524]
We introduce Diffusion-RPO, a new method designed to align diffusion-based T2I models with human preferences more effectively.
We have developed a new evaluation metric, style alignment, aimed at overcoming the challenges of high costs, low interpretability.
Our findings demonstrate that Diffusion-RPO outperforms established methods such as Supervised Fine-Tuning and Diffusion-DPO in tuning Stable Diffusion versions 1.5 and XL-1.0.
arXiv Detail & Related papers (2024-06-10T15:42:03Z) - Aligning Diffusion Models by Optimizing Human Utility [1.6166249658374658]
Diffusion-KTO is a novel approach for aligning text-to-image diffusion models with human preferences.
Our objective requires simple per-image binary feedback signals, e.g. likes or dislikes, which are abundantly available.
arXiv Detail & Related papers (2024-04-06T01:23:23Z) - Diffusion Model Alignment Using Direct Preference Optimization [103.2238655827797]
Diffusion-DPO is a method to align diffusion models to human preferences by directly optimizing on human comparison data.
We fine-tune the base model of the state-of-the-art Stable Diffusion XL (SDXL)-1.0 model with Diffusion-DPO.
We also develop a variant that uses AI feedback and has comparable performance to training on human preferences.
arXiv Detail & Related papers (2023-11-21T15:24:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.