Personalized Soups: Personalized Large Language Model Alignment via
Post-hoc Parameter Merging
- URL: http://arxiv.org/abs/2310.11564v1
- Date: Tue, 17 Oct 2023 20:22:13 GMT
- Title: Personalized Soups: Personalized Large Language Model Alignment via
Post-hoc Parameter Merging
- Authors: Joel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel,
Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, Prithviraj Ammanabrolu
- Abstract summary: We study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem.
LLMs are aligned to multiple preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem.
We show that we can achieve personalized alignment by decomposing preferences into multiple dimensions.
- Score: 148.77027765872006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language
Models (LLMs) with general, aggregate human preferences, it is suboptimal for
learning diverse, individual perspectives. In this work, we study Reinforcement
Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are
aligned to multiple (sometimes conflicting) preferences by modeling alignment
as a Multi-Objective Reinforcement Learning (MORL) problem. Compared to strong
single-objective baselines, we show that we can achieve personalized alignment
by decomposing preferences into multiple dimensions. These dimensions are
defined based on personalizations that are declared as desirable by the user.
In this work, we show that they can be efficiently trained independently in a
distributed manner and combined effectively post-hoc through parameter merging.
The code is available at https://github.com/joeljang/RLPHF.
Related papers
- MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora.
To make LLMs more usable, aligning them with human preferences is essential.
We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z) - Aligning LLMs with Individual Preferences via Interaction [51.72200436159636]
We train large language models (LLMs) that can ''interact to align''
We develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures.
For evaluation, we establish the ALOE benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations.
arXiv Detail & Related papers (2024-10-04T17:48:29Z) - Personality Alignment of Large Language Models [26.071445846818914]
Current methods for aligning large language models (LLMs) typically aim to reflect general human values and behaviors.
We introduce the concept of Personality Alignment.
This approach tailors LLMs' responses and decisions to match the specific preferences of individual users or closely related groups.
arXiv Detail & Related papers (2024-08-21T17:09:00Z) - Orchestrating LLMs with Different Personalizations [28.344891363780576]
This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences.
Given stated preferences along multiple dimensions, such as helpfulness, conciseness, or humor, the goal is to create an LLM without re-training that best adheres to this specification.
Starting from specialized expert LLMs, each trained for one particular preference dimension, we propose a black-box method that merges their outputs on a per-token level.
arXiv Detail & Related papers (2024-07-04T22:55:02Z) - Personalized Language Modeling from Personalized Human Feedback [49.344833339240566]
Reinforcement Learning from Human Feedback (RLHF) is commonly used to fine-tune large language models to better align with human preferences.
In this work, we aim to address this problem by developing methods for building personalized language models.
arXiv Detail & Related papers (2024-02-06T04:18:58Z) - Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback [70.32795295142648]
Linear alignment is a novel algorithm that aligns language models with human preferences in one single inference step.
Experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment.
arXiv Detail & Related papers (2024-01-21T10:46:23Z) - Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization [76.09576643028362]
We present Multi-Objective Direct Preference Optimization (MODPO) for multiple alignment objectives.
MODPO folds language modeling directly into reward modeling, training language models as implicit collective reward models.
It theoretically yields the same optimal solutions as MORLHF but is practically more stable and efficient.
arXiv Detail & Related papers (2023-10-05T17:35:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.