Personalized Adaptation via In-Context Preference Learning
- URL: http://arxiv.org/abs/2410.14001v1
- Date: Thu, 17 Oct 2024 20:06:02 GMT
- Title: Personalized Adaptation via In-Context Preference Learning
- Authors: Allison Lau, Younwoo Choi, Vahid Balazadeh, Keertana Chidambaram, Vasilis Syrgkanis, Rahul G. Krishnan,
- Abstract summary: Preference Pretrained Transformer (PPT) is a novel approach for adaptive personalization using online user feedback.
Our results suggest the potential of in-context learning for scalable and efficient personalization in large language models.
- Score: 20.042909385219716
- License:
- Abstract: Reinforcement Learning from Human Feedback (RLHF) is widely used to align Language Models (LMs) with human preferences. However, existing approaches often neglect individual user preferences, leading to suboptimal personalization. We present the Preference Pretrained Transformer (PPT), a novel approach for adaptive personalization using online user feedback. PPT leverages the in-context learning capabilities of transformers to dynamically adapt to individual preferences. Our approach consists of two phases: (1) an offline phase where we train a single policy model using a history-dependent loss function, and (2) an online phase where the model adapts to user preferences through in-context learning. We demonstrate PPT's effectiveness in a contextual bandit setting, showing that it achieves personalized adaptation superior to existing methods while significantly reducing the computational costs. Our results suggest the potential of in-context learning for scalable and efficient personalization in large language models.
Related papers
- ComPO: Community Preferences for Language Model Personalization [122.54846260663922]
ComPO is a method to personalize preference optimization in language models.
We collect and release ComPRed, a question answering dataset with community-level preferences from Reddit.
arXiv Detail & Related papers (2024-10-21T14:02:40Z) - MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora.
To make LLMs more usable, aligning them with human preferences is essential.
We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z) - Unsupervised Human Preference Learning [7.959043497459107]
Large language models demonstrate impressive reasoning abilities but struggle to provide personalized content.
Existing methods, such as in-context learning and parameter-efficient fine-tuning, fall short in capturing the complexity of human preferences.
We propose a novel approach utilizing small parameter models as preference agents to generate natural language rules that guide a larger, pre-trained model.
arXiv Detail & Related papers (2024-09-30T17:51:01Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - Personalized Language Modeling from Personalized Human Feedback [49.344833339240566]
Reinforcement Learning from Human Feedback (RLHF) is commonly used to fine-tune large language models to better align with human preferences.
In this work, we aim to address this problem by developing methods for building personalized language models.
arXiv Detail & Related papers (2024-02-06T04:18:58Z) - Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback [70.32795295142648]
Linear alignment is a novel algorithm that aligns language models with human preferences in one single inference step.
Experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment.
arXiv Detail & Related papers (2024-01-21T10:46:23Z) - ULMA: Unified Language Model Alignment with Human Demonstration and
Point-wise Preference [16.73260713938154]
A typical alignment procedure consists of supervised fine-tuning and preference learning.
We introduce Point-wise Direct Preference Optimization, a novel preference learning method designed to harness point-wise feedback effectively.
Our work also uncovers a novel connection between supervised fine-tuning and point-wise preference learning, culminating in Unified Language Model Alignment.
arXiv Detail & Related papers (2023-12-05T07:52:12Z) - ZooPFL: Exploring Black-box Foundation Models for Personalized Federated
Learning [95.64041188351393]
This paper endeavors to solve both the challenges of limited resources and personalization.
We propose a method named ZOOPFL that uses Zeroth-Order Optimization for Personalized Federated Learning.
To reduce the computation costs and enhance personalization, we propose input surgery to incorporate an auto-encoder with low-dimensional and client-specific embeddings.
arXiv Detail & Related papers (2023-10-08T12:26:13Z) - Aligning Language Models with Offline Learning from Human Feedback [5.539080592071948]
We propose an offline learning from human feedback framework to align language models without interacting with environments.
Specifically, we explore filtering alignment (FA), reward-weighted regression (RWR), and conditional alignment (CA) to align language models to human preferences.
arXiv Detail & Related papers (2023-08-23T10:41:07Z) - Privacy Regularization: Joint Privacy-Utility Optimization in Language
Models [27.389684148671858]
We introduce two privacy-preserving regularization methods for training language models.
We show the advantages of our regularizers with favorable utility-privacy trade-off.
arXiv Detail & Related papers (2021-03-12T23:17:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.