MixDPO: Modeling Preference Strength for Pluralistic Alignment
- URL: http://arxiv.org/abs/2601.06180v1
- Date: Wed, 07 Jan 2026 16:57:43 GMT
- Title: MixDPO: Modeling Preference Strength for Pluralistic Alignment
- Authors: Saki Imai, Pedram Heydari, Anthony Sicilia, Asteria Kaeberlein, Katherine Atwell, Malihe Alikhani,
- Abstract summary: We introduce Mixed Logit Direct Preference Optimization (MixDPO), a generalization of Direct Preference Optimization that models variation in preference strength.<n>We evaluate MixDPO on three preference datasets using two open-weight language models.
- Score: 24.622787481918863
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Preference based alignment objectives implicitly assume that all human preferences are expressed with equal strength. In practice, however, preference strength varies across individuals and contexts -- a phenomenon established in behavioral economics and discrete choice theory. This mismatch limits the ability of existing objectives to faithfully capture heterogeneous human judgments. Inspired by this literature, we introduce Mixed Logit Direct Preference Optimization (MixDPO), a generalization of Direct Preference Optimization that models variation in preference strength. MixDPO enables alignment objectives to capture heterogeneity in how strongly preferences are expressed across training examples. We evaluate MixDPO on three preference datasets using two open-weight language models. Across datasets, MixDPO improves aggregate alignment performance (+11.2 points on Pythia-2.8B) while preserving subgroup level preferences, with the largest gains appearing in settings with higher inferred preference heterogeneity. MixDPO makes preference heterogeneity explicit through learned strength distributions. We release our code for reproducibility.
Related papers
- Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization [2.1487222438373674]
We propose Mix- and MoE-DPO, a framework that extends DPO with both soft mixture models and mixture-of-experts.<n>Our framework supports both shared base architectures with expert-specific policy heads and fully independent expert models.<n>We validate our approach on a variety of model sizes and multi-preference datasets.
arXiv Detail & Related papers (2025-10-09T14:15:14Z) - Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences [13.588231827053923]
Direct Preference Optimization (DPO) aligns text-to-image (T2I) generation models with human preferences using pairwise preference data.<n>We propose SmPO-Diffusion, a novel method for modeling preference distributions to improve the DPO objective.<n>Our approach effectively mitigates issues of excessive optimization and objective misalignment present in existing methods.
arXiv Detail & Related papers (2025-06-03T09:47:22Z) - Calibrated Multi-Preference Optimization for Aligning Diffusion Models [90.15024547673785]
Calibrated Preference Optimization (CaPO) is a novel method to align text-to-image (T2I) diffusion models.<n>CaPO incorporates the general preference from multiple reward models without human annotated data.<n> Experimental results show that CaPO consistently outperforms prior methods.
arXiv Detail & Related papers (2025-02-04T18:59:23Z) - Personalized Preference Fine-tuning of Diffusion Models [75.22218338096316]
We introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences.<n>With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way.<n>Our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
arXiv Detail & Related papers (2025-01-11T22:38:41Z) - No Preference Left Behind: Group Distributional Preference Optimization [46.98320272443297]
Group Distributional Preference Optimization (GDPO) is a novel framework that aligns language models with the distribution of preferences within a group.<n>GDPO calibrates a language model using statistical estimation of the group's belief distribution.<n>GDPO consistently reduces this alignment gap during training.
arXiv Detail & Related papers (2024-12-28T23:30:47Z) - ComPO: Community Preferences for Language Model Personalization [122.54846260663922]
ComPO is a method to personalize preference optimization in language models.
We collect and release ComPRed, a question answering dataset with community-level preferences from Reddit.
arXiv Detail & Related papers (2024-10-21T14:02:40Z) - Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization [75.1240295759264]
We propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC.<n>We increase the consistency and informativeness of the pairwise preference signals through targeted modifications.<n>We identify that DPO alone is insufficient to model these correlations and capture nuanced variations.
arXiv Detail & Related papers (2024-08-14T11:29:47Z) - Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization [105.3612692153615]
We propose a new axis based on eliciting preferences jointly over instruction-response pairs.<n>Joint preferences over instruction and response pairs can significantly enhance the alignment of large language models.
arXiv Detail & Related papers (2024-03-31T02:05:40Z) - Direct Preference Optimization with an Offset [58.7977683502207]
Direct preference optimization (DPO) is a successful strategy for aligning large language models with human preferences.
We propose a generalization of DPO, termed DPO with an offset (ODPO), that does not treat every preference pair equally during fine-tuning.
arXiv Detail & Related papers (2024-02-16T10:55:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.