Related papers: Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

URL: http://arxiv.org/abs/2310.03708v4
Date: Sat, 17 Aug 2024 13:39:13 GMT
Title: Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
Authors: Zhanhui Zhou, Jie Liu, Jing Shao, Xiangyu Yue, Chao Yang, Wanli Ouyang, Yu Qiao,
Abstract summary: We present Multi-Objective Direct Preference Optimization (MODPO) for multiple alignment objectives. MODPO folds language modeling directly into reward modeling, training language models as implicit collective reward models. It theoretically yields the same optimal solutions as MORLHF but is practically more stable and efficient.
Score: 76.09576643028362
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A single language model, even when aligned with labelers through reinforcement learning from human feedback (RLHF), may not suit all human preferences. Recent approaches therefore prefer customization, gathering multi-dimensional feedback, and creating distinct reward models for each dimension. Different language models are then optimized for various preferences using multi-objective RLHF (MORLHF) with varying reward weights. However, RL fine-tuning is unstable and resource-heavy, especially with diverse and usually conflicting objectives. In this paper, we present Multi-Objective Direct Preference Optimization (MODPO), an RL-free extension of Direct Preference Optimization (DPO) for multiple alignment objectives. Essentially, MODPO folds language modeling directly into reward modeling, training language models as implicit collective reward models that combine all objectives with specific weights. MODPO theoretically yields the same optimal solutions as MORLHF but is practically more stable and efficient. Empirical results in safety alignment and long-form question answering show that MODPO matches or outperforms existing methods, producing a Pareto front of language models catering to diverse preferences with three times less computational resources compared to MORLHF. Code is available at https://github.com/ZHZisZZ/modpo.

Related papers

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models [15.799929216215672]
We introduce the Multi-Objective Preference Optimization (MOPO) algorithm, which frames alignment as a constrained KL-regularized optimization.<n>Unlike prior work, MOPO operates directly on pairwise preference data, requires no point-wise reward assumption, and avoids prompt-context engineering.
arXiv Detail & Related papers (2025-05-16T05:58:26Z)
Robust Multi-Objective Preference Alignment with Online DPO [6.434799451791957]
Multi-objective preference alignment is critical for developing AI systems that are personalizable, helpful, and safe. Existing approaches are either computationally expensive to train or do not sufficiently steer model behaviors. This paper introduces the Multi-Objective Online DPO algorithm, designed to robustly and efficiently align model behaviors with multiple, potentially conflicting human preferences.
arXiv Detail & Related papers (2025-03-01T02:01:49Z)
MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment [14.541973333460149]
Mixing Preference Optimization (MPO) is a post-processing framework for aggregating single-objective policies. MPO achieves balanced performance across diverse preferences, outperforming existing models with significantly reduced computational costs.
arXiv Detail & Related papers (2025-02-25T23:22:12Z)
Personalized Preference Fine-tuning of Diffusion Models [75.22218338096316]
We introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences. With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way. Our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
arXiv Detail & Related papers (2025-01-11T22:38:41Z)
MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora. To make LLMs more usable, aligning them with human preferences is essential. We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z)
Decoding-Time Language Model Alignment with Multiple Objectives [116.42095026960598]
Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives. Here, we propose $textbfmulti-objective decoding (MOD)$, a decoding-time algorithm that outputs the next token from a linear combination of predictions. We show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method.
arXiv Detail & Related papers (2024-06-27T02:46:30Z)
mDPO: Conditional Preference Optimization for Multimodal Large Language Models [52.607764280030196]
Direct preference optimization (DPO) has shown to be an effective method for large language model (LLM) alignment. Recent works have attempted to apply DPO to multimodal scenarios but have found it challenging to achieve consistent improvement. We propose mDPO, a multimodal DPO objective that prevents the over-prioritization of language-only preferences by also optimizing image preference.
arXiv Detail & Related papers (2024-06-17T17:59:58Z)
Multi-objective Reinforcement learning from AI Feedback [0.0]
This paper presents a novel approach to improve the alignment and performance of language models trained using reinforcement learning from AI feedback (RLAIF) In contrast to standard approaches that train a single preference model to represent all human preferences, MORLAIF decomposes this task into simpler principles, such as toxicity, factuality, and sycophancy. Our experiments indicate that MORLAIF outperforms the standard RLAIF baselines and that MORLAIF can be used to align larger language models using smaller ones.
arXiv Detail & Related papers (2024-06-11T14:24:00Z)
Hybrid Preference Optimization: Augmenting Direct Preference Optimization with Auxiliary Objectives [0.5120567378386615]
We propose a hybrid approach to aligning large language models (LLMs) With a simple augmentation to the implicit reward decomposition of DPO, we allow for tuning LLMs to maximize a set of arbitrary auxiliary rewards. The proposed method, Hybrid Preference Optimization (HPO), shows the ability to effectively generalize to both user preferences and auxiliary designer objectives.
arXiv Detail & Related papers (2024-05-28T08:35:48Z)
Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models. The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models. Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z)
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging [148.77027765872006]
We study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem. LLMs are aligned to multiple preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem. We show that we can achieve personalized alignment by decomposing preferences into multiple dimensions.
arXiv Detail & Related papers (2023-10-17T20:22:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.