Panacea: Pareto Alignment via Preference Adaptation for LLMs
- URL: http://arxiv.org/abs/2402.02030v2
- Date: Thu, 23 May 2024 13:49:25 GMT
- Title: Panacea: Pareto Alignment via Preference Adaptation for LLMs
- Authors: Yifan Zhong, Chengdong Ma, Xiaoyuan Zhang, Ziran Yang, Haojun Chen, Qingfu Zhang, Siyuan Qi, Yaodong Yang,
- Abstract summary: Panacea is an innovative approach that reframes alignment as a multi-dimensional preference optimization problem.
A major challenge is using a low-dimensional preference vector to guide the model's behavior.
Panacea is designed to use singular value decomposition (SVD)-based low-rank adaptation.
- Score: 18.278537948335092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current methods for large language model alignment typically use scalar human preference labels. However, this convention tends to oversimplify the multi-dimensional and heterogeneous nature of human preferences, leading to reduced expressivity and even misalignment. This paper presents Panacea, an innovative approach that reframes alignment as a multi-dimensional preference optimization problem. Panacea trains a single model capable of adapting online and Pareto-optimally to diverse sets of preferences without the need for further tuning. A major challenge here is using a low-dimensional preference vector to guide the model's behavior, despite it being governed by an overwhelmingly large number of parameters. To address this, Panacea is designed to use singular value decomposition (SVD)-based low-rank adaptation, which allows the preference vector to be simply injected online as singular values. Theoretically, we prove that Panacea recovers the entire Pareto front with common loss aggregation methods under mild conditions. Moreover, our experiments demonstrate, for the first time, the feasibility of aligning a single LLM to represent an exponentially vast spectrum of human preferences through various optimization methods. Our work marks a step forward in effectively and efficiently aligning models to diverse and intricate human preferences in a controllable and Pareto-optimal manner.
Related papers
- Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors [13.630818884973127]
We propose Preference Vector, a novel framework inspired by task arithmetic.
Instead of optimizing multiple preferences within a single objective, we train separate models on individual preferences, extract behavior shifts as preference vectors, and dynamically merge them at test time.
Experiments show that our proposed Preference Vector framework improves helpfulness without excessive conservatism, allows smooth control over preference trade-offs, and supports scalable multi-preference alignment.
arXiv Detail & Related papers (2025-04-27T12:16:51Z) - Latent Embedding Adaptation for Human Preference Alignment in Diffusion Planners [16.863492060519157]
This work addresses the challenge of personalizing trajectories generated in automated decision-making systems.
We propose a resource-efficient approach that enables rapid adaptation to individual users' preferences.
arXiv Detail & Related papers (2025-03-24T05:11:58Z) - Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball.
We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z) - Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking [50.325021634589596]
We propose a Tailored Optimization Preference (TailorPO) framework for aligning diffusion models with human preference.
Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues.
Experimental results demonstrate that our method significantly improves the model's ability to generate aesthetically pleasing and human-preferred images.
arXiv Detail & Related papers (2025-02-01T16:08:43Z) - Personalized Preference Fine-tuning of Diffusion Models [75.22218338096316]
We introduce PPD, a multi-reward optimization objective that aligns diffusion models with personalized preferences.
With PPD, a diffusion model learns the individual preferences of a population of users in a few-shot way.
Our approach achieves an average win rate of 76% over Stable Cascade, generating images that more accurately reflect specific user preferences.
arXiv Detail & Related papers (2025-01-11T22:38:41Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.
Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.
We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - Geometric-Averaged Preference Optimization for Soft Preference Labels [78.2746007085333]
Many algorithms for aligning LLMs with human preferences assume that human preferences are binary and deterministic.
In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function.
arXiv Detail & Related papers (2024-09-10T17:54:28Z) - Preference-Optimized Pareto Set Learning for Blackbox Optimization [1.9628841617148691]
No single solution exists that can optimize all the objectives simultaneously.
In a typical MOO problem, the goal is to find a set of optimum solutions (Pareto set) that trades off the preferences among objectives.
Our formulation leads to a bilevel optimization problem that can be solved by e.g. differentiable cross-entropy methods.
arXiv Detail & Related papers (2024-08-19T13:23:07Z) - mDPO: Conditional Preference Optimization for Multimodal Large Language Models [52.607764280030196]
Direct preference optimization (DPO) has shown to be an effective method for large language model (LLM) alignment.
Recent works have attempted to apply DPO to multimodal scenarios but have found it challenging to achieve consistent improvement.
We propose mDPO, a multimodal DPO objective that prevents the over-prioritization of language-only preferences by also optimizing image preference.
arXiv Detail & Related papers (2024-06-17T17:59:58Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models.
The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models.
Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z) - SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling [34.32744849352087]
We propose a method that sequentially fine-tunes large language models to align with human preferences.
We theoretically derive closed-form optimal SPO policy and loss function.
Empirical results on LLMs of different size and multiple evaluation datasets demonstrate that SPO successfully aligns LLMs across multiple dimensions of human preferences.
arXiv Detail & Related papers (2024-05-21T12:47:17Z) - Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [103.12563033438715]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment [46.44464839353993]
We introduce Rewards-in-Context (RiC), which conditions the response of a foundation model on multiple rewards in its prompt context.
RiC only requires supervised fine-tuning of a single foundation model and supports dynamic adjustment for user preferences during inference time.
arXiv Detail & Related papers (2024-02-15T18:58:31Z) - Human Preference-Based Learning for High-dimensional Optimization of
Exoskeleton Walking Gaits [55.59198568303196]
This work presents LineCoSpar, a human-in-the-loop preference-based framework to learn user preferences in high dimensions.
In simulations and human trials, we empirically verify that LineCoSpar is a sample-efficient approach for high-dimensional preference optimization.
This result has implications for exoskeleton gait synthesis, an active field with applications to clinical use and patient rehabilitation.
arXiv Detail & Related papers (2020-03-13T22:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.