Related papers: When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO

When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO

URL: http://arxiv.org/abs/2503.16921v1
Date: Fri, 21 Mar 2025 07:33:44 GMT
Title: When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO
Authors: Lingfan Zhang, Chen Liu, Chengming Xu, Kai Hu, Donghao Luo, Chengjie Wang, Yanwei Fu, Yuan Yao,
Abstract summary: This paper explores the role of preference data in the training process of diffusion models.<n>We propose Adaptive-DPO -- a novel approach that incorporates a minority-instance-aware metric into the DPO objective.<n>Our experiments demonstrate that this method effectively handles both synthetic minority data and real-world preference data.
Score: 66.10041557056562
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, the field of image generation has witnessed significant advancements, particularly in fine-tuning methods that align models with universal human preferences. This paper explores the critical role of preference data in the training process of diffusion models, particularly in the context of Diffusion-DPO and its subsequent adaptations. We investigate the complexities surrounding universal human preferences in image generation, highlighting the subjective nature of these preferences and the challenges posed by minority samples in preference datasets. Through pilot experiments, we demonstrate the existence of minority samples and their detrimental effects on model performance. We propose Adaptive-DPO -- a novel approach that incorporates a minority-instance-aware metric into the DPO objective. This metric, which includes intra-annotator confidence and inter-annotator stability, distinguishes between majority and minority samples. We introduce an Adaptive-DPO loss function which improves the DPO loss in two ways: enhancing the model's learning of majority labels while mitigating the negative impact of minority samples. Our experiments demonstrate that this method effectively handles both synthetic minority data and real-world preference data, paving the way for more effective training methodologies in image generation tasks.

Related papers

DeDPO: Debiased Direct Preference Optimization for Diffusion Models [13.068043495097378]
We propose a semi-supervised framework augmenting limited human data with a large corpus of unlabeled pairs annotated via cost-effective synthetic AI feedback.<n>Our paper introduces Debiased DPO (DeDPO), which uniquely integrates a debiased estimation technique from causal inference into the DPO objective.<n> Experiments demonstrate that DeDPO is robust to the variations in synthetic labeling methods, achieving performance that matches and occasionally exceeds the theoretical upper bound of models trained on fully human-labeled data.
arXiv Detail & Related papers (2026-02-05T21:11:00Z)
Beyond Reward Margin: Rethinking and Resolving Likelihood Displacement in Diffusion Models via Video Generation [6.597818816347323]
Direct Preference Optimization aims to align generative outputs with human preferences by distinguishing between chosen and rejected samples.<n>A critical limitation of DPO is likelihood displacement, where the probabilities of chosen samples paradoxically decrease during training.<n>We introduce a novel solution named Policy-Guided DPO, combining Adaptive Rejection Scaling (ARS) and Implicit Preference Regularization (IPR)<n>Experiments show that PG-DPO outperforms existing methods in both quantitative metrics and qualitative evaluations.
arXiv Detail & Related papers (2025-11-24T12:37:49Z)
On the Role of Preference Variance in Preference Optimization [55.364953481473286]
We investigate the impact of emphpreference variance (PVar) on the effectiveness of Direct Preference Optimization (DPO) training.<n>We show that prompts with higher PVar outperform randomly selected prompts or those with lower PVar.
arXiv Detail & Related papers (2025-10-14T22:34:52Z)
Self-NPO: Negative Preference Optimization of Diffusion Models by Simply Learning from Itself without Explicit Preference Annotations [60.143658714894336]
Diffusion models have demonstrated remarkable success in various visual generation tasks, including image, video, and 3D content generation.<n> Preference optimization (PO) is a prominent and growing area of research that aims to align these models with human preferences.<n>We introduce Self-NPO, a Negative Preference Optimization approach that learns exclusively from the model itself.
arXiv Detail & Related papers (2025-05-17T01:03:46Z)
Personalized Preference Fine-tuning of Diffusion Models [75.22218338096316]
We introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences. With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way. Our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
arXiv Detail & Related papers (2025-01-11T22:38:41Z)
Direct Preference Optimization With Unobserved Preference Heterogeneity [16.91835461818937]
This paper presents a new method to align generative models with varied human preferences. We propose an Expectation-Maximization adaptation to DPO, generating a mixture of models based on latent preference types of the annotators. Our algorithms leverage the simplicity of DPO while accommodating diverse preferences.
arXiv Detail & Related papers (2024-05-23T21:25:20Z)
Chameleon: Foundation Models for Fairness-aware Multi-modal Data Augmentation to Enhance Coverage of Minorities [25.215178019059874]
Underrepresentation of minorities in training data is a well-recognized concern. We propose Chameleon, a system that augments a data set with a minimal addition of settings to enhance the coverage of under-represented groups. Our experiment, in addition to confirming the efficiency of our proposed algorithms, illustrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-02-02T00:16:45Z)
Adversarial Reweighting Guided by Wasserstein Distance for Bias Mitigation [24.160692009892088]
Under-representation of minorities in the data makes the disparate treatment of subpopulations difficult to deal with during learning. We propose a novel adversarial reweighting method to address such emphrepresentation bias.
arXiv Detail & Related papers (2023-11-21T15:46:11Z)
Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm. We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift. We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z)
Generative Oversampling for Imbalanced Data via Majority-Guided VAE [15.93867386081279]
We propose a novel over-sampling model, called Majority-Guided VAE(MGVAE), which generates new minority samples under the guidance of a majority-based prior. In this way, the newly generated minority samples can inherit the diversity and richness of the majority ones, thus mitigating overfitting in downstream tasks.
arXiv Detail & Related papers (2023-02-14T06:35:23Z)
Don't Play Favorites: Minority Guidance for Diffusion Models [59.75996752040651]
We present a novel framework that can make the generation process of the diffusion models focus on the minority samples. We develop minority guidance, a sampling technique that can guide the generation process toward regions with desired likelihood levels.
arXiv Detail & Related papers (2023-01-29T03:08:47Z)
FairIF: Boosting Fairness in Deep Learning via Influence Functions with Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF. It minimizes the loss over the reweighted data set where the sample weights are computed. We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.