Related papers: Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

URL: http://arxiv.org/abs/2402.19085v1
Date: Thu, 29 Feb 2024 12:12:30 GMT
Title: Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
Authors: Yiju Guo, Ganqu Cui, Lifan Yuan, Ning Ding, Jiexin Wang, Huimin Chen, Bowen Sun, Ruobing Xie, Jie Zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun
Abstract summary: Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values. Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives. We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
Score: 107.63756895544842
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Alignment in artificial intelligence pursues the consistency between model responses and human preferences as well as values. In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e.g.,harmlessness) can diminish performance in others (e.g.,helpfulness). However, existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives. To navigate this challenge, we argue the prominence of grounding LLMs with evident preferences. We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives, thereby guiding the model to generate responses that meet the requirements. Our experimental analysis reveals that the aligned models can provide responses that match various preferences among the "3H" (helpfulness, honesty, harmlessness) desiderata. Furthermore, by introducing diverse data and alignment goals, we surpass baseline methods in aligning with single objectives, hence mitigating the impact of the alignment tax and achieving Pareto improvements in multi-objective alignment.

Related papers

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models [19.559468441956714]
Reinforcement Learning from Human Feedback has emerged as a powerful technique for aligning large language models with human preferences.<n>We frame human value alignment as a multi-objective optimization problem, aiming to maximize a set of potentially conflicting objectives.<n>We introduce Gradient-Adaptive Policy Optimization (GAPO), a novel fine-tuning paradigm that employs multiple-gradient descent to align LLMs with diverse preference distributions.
arXiv Detail & Related papers (2025-07-02T17:25:26Z)
Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models [15.799929216215672]
We introduce the Multi-Objective Preference Optimization (MOPO) algorithm, which frames alignment as a constrained KL-regularized optimization.<n>Unlike prior work, MOPO operates directly on pairwise preference data, requires no point-wise reward assumption, and avoids prompt-context engineering.
arXiv Detail & Related papers (2025-05-16T05:58:26Z)
Preference-Guided Diffusion for Multi-Objective Offline Optimization [64.08326521234228]
We propose a preference-guided diffusion model for offline multi-objective optimization. Our guidance is a preference model trained to predict the probability that one design dominates another. Our results highlight the effectiveness of classifier-guided diffusion models in generating diverse and high-quality solutions.
arXiv Detail & Related papers (2025-03-21T16:49:38Z)
Robust Multi-Objective Preference Alignment with Online DPO [6.434799451791957]
Multi-objective preference alignment is critical for developing AI systems that are personalizable, helpful, and safe. Existing approaches are either computationally expensive to train or do not sufficiently steer model behaviors. This paper introduces the Multi-Objective Online DPO algorithm, designed to robustly and efficiently align model behaviors with multiple, potentially conflicting human preferences.
arXiv Detail & Related papers (2025-03-01T02:01:49Z)
Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives. We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications. Ensuring their alignment with the diverse preferences of individual users has become a critical challenge. We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models [54.381650481255235]
We introduce a new tuning-free approach for self-alignment, Dynamic Rewarding with Prompt Optimization (O) Our approach leverages a search-based optimization framework that allows LLMs to iteratively self-improve and craft the optimal alignment instructions. Empirical evaluations on eight recent LLMs, both open and closed-sourced, demonstrate that DRPO significantly enhances alignment performance.
arXiv Detail & Related papers (2024-11-13T16:15:38Z)
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment [57.03947082589616]
Large Language Models (LLMs) are often aligned using contrastive alignment objectives and preference pair datasets. We study this and find that preference data gives a better learning signal when the underlying responses are contrastive. We introduce Contrastive Learning from AI Revisions (CLAIR), a data-creation method which leads to more contrastive preference pairs. Our best model, trained on 32K CLAIR preferences with APO, improves Llama-3-8B-Instruct by 7.65%, closing the gap with GPT4-turbo by 45%.
arXiv Detail & Related papers (2024-08-12T16:24:51Z)
Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts [38.95012734839997]
Multi-objective alignment aims at balancing and controlling the different alignment objectives of large language models. We propose MCA (Multi-objective Contrastive Alignemnt), which constructs an expert prompt and an adversarial prompt for each objective to contrast.
arXiv Detail & Related papers (2024-08-09T14:36:42Z)
Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems [60.91599969408029]
optimizing multiple objectives simultaneously is an important task for recommendation platforms. Existing multi-objective recommender systems do not systematically consider such dynamic relationships.
arXiv Detail & Related papers (2024-07-04T02:19:49Z)
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data [102.16105233826917]
Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning.
arXiv Detail & Related papers (2024-04-22T17:20:18Z)
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment [46.44464839353993]
We introduce Rewards-in-Context (RiC), which conditions the response of a foundation model on multiple rewards in its prompt context. RiC only requires supervised fine-tuning of a single foundation model and supports dynamic adjustment for user preferences during inference time.
arXiv Detail & Related papers (2024-02-15T18:58:31Z)
From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models [48.326660953180145]
We conduct a survey of different alignment goals in existing work and trace their evolution paths to help identify the most essential goal. Our analysis reveals a goal transformation from fundamental abilities to value orientation, indicating the potential of intrinsic human values as the alignment goal for enhanced LLMs.
arXiv Detail & Related papers (2023-08-23T09:11:13Z)
An Approach to Ordering Objectives and Pareto Efficient Solutions [0.0]
Solutions to multi-objective optimization problems can generally not be compared or ordered. Decision-makers are often made to believe that scaled objectives can be compared. We present a method that uses the probability integral transform in order to map the objectives of a problem into scores that all share the same range.
arXiv Detail & Related papers (2022-05-30T17:55:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.