Decoding-Time Language Model Alignment with Multiple Objectives
- URL: http://arxiv.org/abs/2406.18853v3
- Date: Mon, 28 Oct 2024 03:23:14 GMT
- Title: Decoding-Time Language Model Alignment with Multiple Objectives
- Authors: Ruizhe Shi, Yifang Chen, Yushi Hu, Alisa Liu, Hannaneh Hajishirzi, Noah A. Smith, Simon S. Du,
- Abstract summary: Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives.
Here, we propose $textbfmulti-objective decoding (MOD)$, a decoding-time algorithm that outputs the next token from a linear combination of predictions.
We show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method.
- Score: 116.42095026960598
- License:
- Abstract: Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives. Here, we propose $\textbf{multi-objective decoding (MOD)}$, a decoding-time algorithm that outputs the next token from a linear combination of predictions of all base models, for any given weightings over different objectives. We exploit a common form among a family of $f$-divergence regularized alignment approaches (such as PPO, DPO, and their variants) to identify a closed-form solution by Legendre transform, and derive an efficient decoding strategy. Theoretically, we show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method. Empirical results demonstrate the effectiveness of the algorithm. For example, compared to a parameter-merging baseline, MOD achieves 12.8% overall reward improvement when equally optimizing towards $3$ objectives. Moreover, we experiment with MOD on combining three fully-finetuned LLMs of different model sizes, each aimed at different objectives such as safety, coding, and general user preference. Unlike traditional methods that require careful curation of a mixture of datasets to achieve comprehensive improvement, we can quickly experiment with preference weightings using MOD to find the best combination of models. Our best combination reduces toxicity on Toxigen to nearly 0% and achieves 7.9--33.3% improvement across other three metrics ($\textit{i.e.}$, Codex@1, GSM-COT, BBH-COT).
Related papers
- $f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization [91.43730624072226]
$f$-PO is a novel framework that generalizes and extends existing approaches.
We conduct experiments on state-of-the-art language models using benchmark datasets.
arXiv Detail & Related papers (2024-10-29T02:11:45Z) - It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization [16.54335356612006]
The goal of model merging is to combine multiple models, each excelling in different tasks, into a single model that outperforms any of the individual source models.
Existing methods rely heavily on human intuition and customized strategies to tackle multiple tasks.
We propose a multi-objective optimization based model merging method named MM-MO.
arXiv Detail & Related papers (2024-06-29T16:34:23Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data.
We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice.
We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z) - Adaptive Neural Ranking Framework: Toward Maximized Business Goal for
Cascade Ranking Systems [33.46891569350896]
Cascade ranking is widely used for large-scale top-k selection problems in online advertising and recommendation systems.
Previous works on learning-to-rank usually focus on letting the model learn the complete order or top-k order.
We name this method as Adaptive Neural Ranking Framework (abbreviated as ARF)
arXiv Detail & Related papers (2023-10-16T14:43:02Z) - Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization [76.09576643028362]
We present Multi-Objective Direct Preference Optimization (MODPO) for multiple alignment objectives.
MODPO folds language modeling directly into reward modeling, training language models as implicit collective reward models.
It theoretically yields the same optimal solutions as MORLHF but is practically more stable and efficient.
arXiv Detail & Related papers (2023-10-05T17:35:26Z) - Practical Bayesian Optimization of Objectives with Conditioning
Variables [1.0497128347190048]
We consider the more general case where a user is faced with multiple problems that each need to be optimized conditional on a state variable.
Similarity across objectives boosts optimization of each objective in two ways.
We propose a framework for conditional optimization: ConBO.
arXiv Detail & Related papers (2020-02-23T22:06:26Z) - Expected Information Maximization: Using the I-Projection for Mixture
Density Estimation [22.096148237257644]
Modelling highly multi-modal data is a challenging problem in machine learning.
We present a new algorithm called Expected Information Maximization (EIM) for computing the I-projection.
We show that our algorithm is much more effective in computing the I-projection than recent GAN approaches.
arXiv Detail & Related papers (2020-01-23T17:24:50Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.