Related papers: Dual Caption Preference Optimization for Diffusion Models

Dual Caption Preference Optimization for Diffusion Models

URL: http://arxiv.org/abs/2502.06023v2
Date: Sat, 18 Oct 2025 18:05:43 GMT
Title: Dual Caption Preference Optimization for Diffusion Models
Authors: Amir Saeidi, Yiran Luo, Agneet Chatterjee, Shamanthak Hegde, Bimsara Pathiraja, Yezhou Yang, Chitta Baral,
Abstract summary: We introduce Dual Caption Preference Optimization (DCPO) to improve text-to-image diffusion models.<n>DCPO assigns two distinct captions to each preference pair, which reinforces the learning signal.<n>Experiments show that DCPO significantly improves image quality and relevance to prompts.
Score: 53.218293277964165
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in human preference optimization, originally developed for Large Language Models (LLMs), have shown significant potential in improving text-to-image diffusion models. These methods aim to learn the distribution of preferred samples while distinguishing them from less preferred ones. However, within the existing preference datasets, the original caption often does not clearly favor the preferred image over the alternative, which weakens the supervision signal available during training. To address this issue, we introduce Dual Caption Preference Optimization (DCPO), a data augmentation and optimization framework that reinforces the learning signal by assigning two distinct captions to each preference pair. This encourages the model to better differentiate between preferred and less-preferred outcomes during training. We also construct Pick-Double Caption, a modified version of Pick-a-Pic v2 with separate captions for each image, and propose three different strategies for generating distinct captions: captioning, perturbation, and hybrid methods. Our experiments show that DCPO significantly improves image quality and relevance to prompts, outperforming Stable Diffusion (SD) 2.1, SFT_Chosen, Diffusion-DPO, and MaPO across multiple metrics, including Pickscore, HPSv2.1, GenEval, CLIPscore, and ImageReward, fine-tuned on SD 2.1 as the backbone.

Related papers

Towards Better Optimization For Listwise Preference in Diffusion Models [19.40269067848114]
We propose Diffusion-LPO, a framework for Listwise Preference Optimization in diffusion models with listwise data.<n>Given a caption, we aggregate user feedback into a ranked list of images and derive a listwise extension of the DPO objective under the Plackett-Luce model.<n>We empirically demonstrate the effectiveness of Diffusion-LPO across various tasks, including text-to-image generation, image editing, and personalized preference alignment.
arXiv Detail & Related papers (2025-10-02T00:26:37Z)
Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization [68.64764778089229]
We propose MISP-DPO, the first framework to incorporate multiple, semantically diverse negative images in multimodal DPO.<n>Our method embeds prompts and candidate images in CLIP space and applies a sparse autoencoder to uncover semantic deviations into interpretable factors.<n>Experiments across five benchmarks demonstrate that MISP-DPO consistently improves multimodal alignment over prior methods.
arXiv Detail & Related papers (2025-09-30T03:24:09Z)
Calibrated Multi-Preference Optimization for Aligning Diffusion Models [92.90660301195396]
Calibrated Preference Optimization (CaPO) is a novel method to align text-to-image (T2I) diffusion models.<n>CaPO incorporates the general preference from multiple reward models without human annotated data.<n> Experimental results show that CaPO consistently outperforms prior methods.
arXiv Detail & Related papers (2025-02-04T18:59:23Z)
Personalized Preference Fine-tuning of Diffusion Models [75.22218338096316]
We introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences.<n>With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way.<n>Our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
arXiv Detail & Related papers (2025-01-11T22:38:41Z)
Scalable Ranked Preference Optimization for Text-to-Image Generation [76.16285931871948]
We investigate a scalable approach for collecting large-scale and fully synthetic datasets for DPO training. The preferences for paired images are generated using a pre-trained reward function, eliminating the need for involving humans in the annotation process. We introduce RankDPO to enhance DPO-based methods using the ranking feedback.
arXiv Detail & Related papers (2024-10-23T16:42:56Z)
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models [85.30735602813093]
Multi-Image Augmented Direct Preference Optimization (MIA-DPO) is a visual preference alignment approach that effectively handles multi-image inputs. MIA-DPO mitigates the scarcity of diverse multi-image training data by extending single-image data with unrelated images arranged in grid collages or pic-in-pic formats.
arXiv Detail & Related papers (2024-10-23T07:56:48Z)
Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization [68.69203905664524]
We introduce Diffusion-RPO, a new method designed to align diffusion-based T2I models with human preferences more effectively. We have developed a new evaluation metric, style alignment, aimed at overcoming the challenges of high costs, low interpretability. Our findings demonstrate that Diffusion-RPO outperforms established methods such as Supervised Fine-Tuning and Diffusion-DPO in tuning Stable Diffusion versions 1.5 and XL-1.0.
arXiv Detail & Related papers (2024-06-10T15:42:03Z)
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization [20.698818784349015]
This paper introduces step-by-step preference optimization (SPO) to improve aesthetics economically.<n>SPO discards the propagation strategy and allows fine-grained image details to be assessed.<n>SPO converges much faster than DPO methods due to the step-by-step alignment of fine-grained visual details.
arXiv Detail & Related papers (2024-06-06T17:57:09Z)
Diffusion Model Alignment Using Direct Preference Optimization [103.2238655827797]
Diffusion-DPO is a method to align diffusion models to human preferences by directly optimizing on human comparison data. We fine-tune the base model of the state-of-the-art Stable Diffusion XL (SDXL)-1.0 model with Diffusion-DPO. We also develop a variant that uses AI feedback and has comparable performance to training on human preferences.
arXiv Detail & Related papers (2023-11-21T15:24:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.