Aligning LLMs with Individual Preferences via Interaction
- URL: http://arxiv.org/abs/2410.03642v1
- Date: Fri, 4 Oct 2024 17:48:29 GMT
- Title: Aligning LLMs with Individual Preferences via Interaction
- Authors: Shujin Wu, May Fung, Cheng Qian, Jeonghwan Kim, Dilek Hakkani-Tur, Heng Ji,
- Abstract summary: We train large language models (LLMs) that can ''interact to align''
We develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures.
For evaluation, we establish the ALOE benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations.
- Score: 51.72200436159636
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As large language models (LLMs) demonstrate increasingly advanced capabilities, aligning their behaviors with human values and preferences becomes crucial for their wide adoption. While previous research focuses on general alignment to principles such as helpfulness, harmlessness, and honesty, the need to account for individual and diverse preferences has been largely overlooked, potentially undermining customized human experiences. To address this gap, we train LLMs that can ''interact to align'', essentially cultivating the meta-skill of LLMs to implicitly infer the unspoken personalized preferences of the current user through multi-turn conversations, and then dynamically align their following behaviors and responses to these inferred preferences. Our approach involves establishing a diverse pool of 3,310 distinct user personas by initially creating seed examples, which are then expanded through iterative self-generation and filtering. Guided by distinct user personas, we leverage multi-LLM collaboration to develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures. Finally, we apply supervised fine-tuning and reinforcement learning to enhance LLMs using this dataset. For evaluation, we establish the ALOE (ALign With CustOmized PrEferences) benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations. Experimental results demonstrate the effectiveness of our method in enabling dynamic, personalized alignment via interaction.
Related papers
- LLM-assisted Explicit and Implicit Multi-interest Learning Framework for Sequential Recommendation [50.98046887582194]
We propose an explicit and implicit multi-interest learning framework to model user interests on two levels: behavior and semantics.
The proposed EIMF framework effectively and efficiently combines small models with LLM to improve the accuracy of multi-interest modeling.
arXiv Detail & Related papers (2024-11-14T13:00:23Z) - Active Preference-based Learning for Multi-dimensional Personalization [7.349038301460469]
Large language models (LLMs) have shown remarkable versatility across tasks, but aligning them with individual human preferences remains challenging.
We propose an active preference learning framework that uses binary feedback to estimate user preferences across multiple objectives.
We validate our approach through theoretical analysis and experiments on language generation tasks, demonstrating its feedback efficiency and effectiveness in personalizing model responses.
arXiv Detail & Related papers (2024-11-01T11:49:33Z) - MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora.
To make LLMs more usable, aligning them with human preferences is essential.
We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z) - PersonalLLM: Tailoring LLMs to Individual Preferences [11.717169516971856]
We present a public benchmark, PersonalLLM, focusing on adapting LLMs to provide maximal benefits for a particular user.
We curate open-ended prompts paired with many high-quality answers over which users would be expected to display heterogeneous latent preferences.
Our dataset and generated personalities offer an innovative testbed for developing personalization algorithms.
arXiv Detail & Related papers (2024-09-30T13:55:42Z) - Personality Alignment of Large Language Models [26.071445846818914]
Current methods for aligning large language models (LLMs) typically aim to reflect general human values and behaviors.
We introduce the concept of Personality Alignment.
This approach tailors LLMs' responses and decisions to match the specific preferences of individual users or closely related groups.
arXiv Detail & Related papers (2024-08-21T17:09:00Z) - Few-shot Personalization of LLMs with Mis-aligned Responses [40.0349773257245]
This paper proposes a new approach for a few-shot personalization of large language models (LLMs)
Our key idea is to learn a set of personalized prompts for each user by progressively improving the prompts using LLMs.
During an iterative process of prompt improvement, we incorporate the contexts of mis-aligned responses by LLMs.
arXiv Detail & Related papers (2024-06-26T18:29:12Z) - Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search [9.243535345193711]
Our method uses large language models to guide a single human worker in generating personalized dialogues.
LAPS can collect large-scale, human-written, multi-session, and multi-domain conversations.
Our results show that responses generated explicitly using extracted preferences better match user's actual preferences.
arXiv Detail & Related papers (2024-05-06T13:53:03Z) - Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment [103.12563033438715]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts.
RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z) - Personalized Soups: Personalized Large Language Model Alignment via
Post-hoc Parameter Merging [148.77027765872006]
We study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem.
LLMs are aligned to multiple preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem.
We show that we can achieve personalized alignment by decomposing preferences into multiple dimensions.
arXiv Detail & Related papers (2023-10-17T20:22:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.