Related papers: Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

URL: http://arxiv.org/abs/2310.11564v1
Date: Tue, 17 Oct 2023 20:22:13 GMT
Title: Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
Authors: Joel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel, Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, Prithviraj Ammanabrolu
Abstract summary: We study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem. LLMs are aligned to multiple preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem. We show that we can achieve personalized alignment by decomposing preferences into multiple dimensions.
Score: 148.77027765872006
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language Models (LLMs) with general, aggregate human preferences, it is suboptimal for learning diverse, individual perspectives. In this work, we study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are aligned to multiple (sometimes conflicting) preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem. Compared to strong single-objective baselines, we show that we can achieve personalized alignment by decomposing preferences into multiple dimensions. These dimensions are defined based on personalizations that are declared as desirable by the user. In this work, we show that they can be efficiently trained independently in a distributed manner and combined effectively post-hoc through parameter merging. The code is available at https://github.com/joeljang/RLPHF.

Related papers

HyPerAlign: Interpretable Personalized LLM Alignment via Hypothesis Generation [24.67727411391369]
HyPerAlign is an interpretable and sample-efficient hypothesis-driven personalization approach for large language models.<n>We conduct experiments on two different personalization tasks, namely authorship attribution and deliberative alignment.<n>Results demonstrate the superiority of hypothesis-driven personalization compared to preference-based fine-tuning methods.
arXiv Detail & Related papers (2025-04-29T18:01:46Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications. Ensuring their alignment with the diverse preferences of individual users has become a critical challenge. We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora. To make LLMs more usable, aligning them with human preferences is essential. We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z)
Aligning LLMs with Individual Preferences via Interaction [51.72200436159636]
We train large language models (LLMs) that can ''interact to align'' We develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures. For evaluation, we establish the ALOE benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations.
arXiv Detail & Related papers (2024-10-04T17:48:29Z)
Personality Alignment of Large Language Models [26.071445846818914]
Current methods for aligning large language models (LLMs) typically aim to reflect general human values and behaviors. We introduce the concept of Personality Alignment. This approach tailors LLMs' responses and decisions to match the specific preferences of individual users or closely related groups.
arXiv Detail & Related papers (2024-08-21T17:09:00Z)
Orchestrating LLMs with Different Personalizations [28.344891363780576]
This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences. Given stated preferences along multiple dimensions, such as helpfulness, conciseness, or humor, the goal is to create an LLM without re-training that best adheres to this specification. Starting from specialized expert LLMs, each trained for one particular preference dimension, we propose a black-box method that merges their outputs on a per-token level.
arXiv Detail & Related papers (2024-07-04T22:55:02Z)
Personalized Language Modeling from Personalized Human Feedback [49.344833339240566]
Reinforcement Learning from Human Feedback (RLHF) is commonly used to fine-tune large language models to better align with human preferences. In this work, we aim to address this problem by developing methods for building personalized language models.
arXiv Detail & Related papers (2024-02-06T04:18:58Z)
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback [70.32795295142648]
Linear alignment is a novel algorithm that aligns language models with human preferences in one single inference step. Experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment.
arXiv Detail & Related papers (2024-01-21T10:46:23Z)
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization [76.09576643028362]
We present Multi-Objective Direct Preference Optimization (MODPO) for multiple alignment objectives. MODPO folds language modeling directly into reward modeling, training language models as implicit collective reward models. It theoretically yields the same optimal solutions as MORLHF but is practically more stable and efficient.
arXiv Detail & Related papers (2023-10-05T17:35:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.