ComPO: Community Preferences for Language Model Personalization
- URL: http://arxiv.org/abs/2410.16027v1
- Date: Mon, 21 Oct 2024 14:02:40 GMT
- Title: ComPO: Community Preferences for Language Model Personalization
- Authors: Sachin Kumar, Chan Young Park, Yulia Tsvetkov, Noah A. Smith, Hannaneh Hajishirzi,
- Abstract summary: ComPO is a method to personalize preference optimization in language models.
We collect and release ComPRed, a question answering dataset with community-level preferences from Reddit.
- Score: 122.54846260663922
- License:
- Abstract: Conventional algorithms for training language models (LMs) with human feedback rely on preferences that are assumed to account for an "average" user, disregarding subjectivity and finer-grained variations. Recent studies have raised concerns that aggregating such diverse and often contradictory human feedback to finetune models results in generic models that generate outputs not preferred by many user groups, as they tend to average out styles and norms. To address this issue, we draw inspiration from recommendation systems and propose ComPO, a method to personalize preference optimization in LMs by contextualizing the probability distribution of model outputs with the preference provider. Focusing on group-level preferences rather than individuals, we collect and release ComPRed, a question answering dataset with community-level preferences from Reddit. This dataset facilitates studying diversity in preferences without incurring privacy concerns associated with individual feedback. Our experiments reveal that conditioning language models on a community identifier (i.e., subreddit name) during preference tuning substantially enhances model performance. Conversely, replacing this context with random subreddit identifiers significantly diminishes performance, highlighting the effectiveness of our approach in tailoring responses to communities' preferences.
Related papers
- VPO: Leveraging the Number of Votes in Preference Optimization [5.200545764106177]
We introduce a technique that leverages user voting data to better align with diverse subjective preferences.
We develop the Vote-based Preference Optimization framework, which incorporates the number of votes on both sides to distinguish between controversial and obvious generation pairs.
arXiv Detail & Related papers (2024-10-30T10:39:34Z) - Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback [87.37721254914476]
We introduce a routing framework that combines inputs from humans and LMs to achieve better annotation quality.
We train a performance prediction model to predict a reward model's performance on an arbitrary combination of human and LM annotations.
We show that the selected hybrid mixture achieves better reward model performance compared to using either one exclusively.
arXiv Detail & Related papers (2024-10-24T20:04:15Z) - General Preference Modeling with Preference Representations for Aligning Language Models [51.14207112118503]
We introduce preference representation learning, an approach that embeds responses into a latent space to capture intricate preference structures efficiently.
We also propose preference score-based General Preference Optimization (GPO), which generalizes reward-based reinforcement learning from human feedback.
Our method may enhance the alignment of foundation models with nuanced human values.
arXiv Detail & Related papers (2024-10-03T04:22:55Z) - PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences [6.398937923320069]
We propose PAL, a framework to model human preference complementary to existing pretraining strategies.
We show that PAL achieves competitive reward model accuracy compared to strong baselines.
arXiv Detail & Related papers (2024-06-12T17:54:54Z) - Direct Preference Optimization With Unobserved Preference Heterogeneity [16.91835461818937]
This paper presents a new method to align generative models with varied human preferences.
We propose an Expectation-Maximization adaptation to DPO, generating a mixture of models based on latent preference types of the annotators.
Our algorithms leverage the simplicity of DPO while accommodating diverse preferences.
arXiv Detail & Related papers (2024-05-23T21:25:20Z) - Aligning Crowd Feedback via Distributional Preference Reward Modeling [28.754532173765686]
We propose the Distributional Preference Reward Model (DPRM) to align large language models with diverse human preferences.
Our experiments show that DPRM significantly enhances the alignment of LLMs with population preference, yielding more accurate, unbiased, and contextually appropriate responses.
arXiv Detail & Related papers (2024-02-15T07:29:43Z) - MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with
Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.
We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.
Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z) - Personalized Language Modeling from Personalized Human Feedback [49.344833339240566]
Reinforcement Learning from Human Feedback (RLHF) is commonly used to fine-tune large language models to better align with human preferences.
In this work, we aim to address this problem by developing methods for building personalized language models.
arXiv Detail & Related papers (2024-02-06T04:18:58Z) - Models of human preference for learning reward functions [80.39289349661364]
We learn the reward function from human-generated preferences between pairs of trajectory segments.
We find this assumption to be flawed and propose modeling human preferences as informed by each segment's regret.
Our proposed regret preference model better predicts real human preferences and also learns reward functions from these preferences that lead to policies that are better human-aligned.
arXiv Detail & Related papers (2022-06-05T17:58:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.