Active Preference-based Learning for Multi-dimensional Personalization
- URL: http://arxiv.org/abs/2411.00524v1
- Date: Fri, 01 Nov 2024 11:49:33 GMT
- Title: Active Preference-based Learning for Multi-dimensional Personalization
- Authors: Minhyeon Oh, Seungjoon Lee, Jungseul Ok,
- Abstract summary: Large language models (LLMs) have shown remarkable versatility across tasks, but aligning them with individual human preferences remains challenging.
We propose an active preference learning framework that uses binary feedback to estimate user preferences across multiple objectives.
We validate our approach through theoretical analysis and experiments on language generation tasks, demonstrating its feedback efficiency and effectiveness in personalizing model responses.
- Score: 7.349038301460469
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have shown remarkable versatility across tasks, but aligning them with individual human preferences remains challenging due to the complexity and diversity of these preferences. Existing methods often overlook the fact that preferences are multi-objective, diverse, and hard to articulate, making full alignment difficult. In response, we propose an active preference learning framework that uses binary feedback to estimate user preferences across multiple objectives. Our approach leverages Bayesian inference to update preferences efficiently and reduces user feedback through an acquisition function that optimally selects queries. Additionally, we introduce a parameter to handle feedback noise and improve robustness. We validate our approach through theoretical analysis and experiments on language generation tasks, demonstrating its feedback efficiency and effectiveness in personalizing model responses.
Related papers
- Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User [117.82681846559909]
Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations.
We propose a generative reward model based simulated user, named GRSU, for automatic interaction with CRSs.
arXiv Detail & Related papers (2025-04-29T06:37:30Z) - LoRe: Personalizing LLMs via Low-Rank Reward Modeling [47.12507639759984]
We introduce a novel framework that leverages low-rank preference modeling to efficiently learn and generalize user-specific reward functions.
We validate our method on multiple preference datasets, demonstrating superior generalization to unseen users and improved accuracy in preference prediction tasks.
arXiv Detail & Related papers (2025-04-20T01:16:24Z) - Provably Efficient Multi-Objective Bandit Algorithms under Preference-Centric Customization [24.533662423325943]
We study a preference-aware MO-MAB framework in the presence of explicit user preference.
This is the first theoretical study of customized MO-MAB optimization with explicit user preferences.
arXiv Detail & Related papers (2025-02-19T06:06:13Z) - Uncertain Multi-Objective Recommendation via Orthogonal Meta-Learning Enhanced Bayesian Optimization [30.031396809114625]
We introduce a novel framework that categorizes RS autonomy into five distinct levels, ranging from basic rule-based accuracy-driven systems to behavior-aware, uncertain multi-objective RSs.
We propose an approach that dynamically identifies and optimize multiple objectives based on individual user preferences, fostering more ethical and intelligent user-centric recommendations.
arXiv Detail & Related papers (2025-02-18T08:10:09Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.
Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.
We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - Beyond the Binary: Capturing Diverse Preferences With Reward Regularization [15.518838657050173]
We argue that this reliance on binary choices does not capture the broader, aggregate preferences of the target user in real-world tasks.
We introduce a simple yet effective method that augments existing binary preference datasets with synthetic preference judgments to estimate potential user disagreement.
arXiv Detail & Related papers (2024-12-05T02:35:46Z) - ComPO: Community Preferences for Language Model Personalization [122.54846260663922]
ComPO is a method to personalize preference optimization in language models.
We collect and release ComPRed, a question answering dataset with community-level preferences from Reddit.
arXiv Detail & Related papers (2024-10-21T14:02:40Z) - MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora.
To make LLMs more usable, aligning them with human preferences is essential.
We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z) - MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning [99.09906827676748]
We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks.
Our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model.
In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate optimality between text adherence, motion quality, and human preferences.
arXiv Detail & Related papers (2024-10-09T03:27:14Z) - Aligning LLMs with Individual Preferences via Interaction [51.72200436159636]
We train large language models (LLMs) that can ''interact to align''
We develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures.
For evaluation, we establish the ALOE benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations.
arXiv Detail & Related papers (2024-10-04T17:48:29Z) - Contrastive Learning Method for Sequential Recommendation based on Multi-Intention Disentanglement [5.734747179463411]
We propose a Contrastive Learning sequential recommendation method based on Multi-Intention Disentanglement (MIDCL)
In our work, intentions are recognized as dynamic and diverse, and user behaviors are often driven by current multi-intentions.
We propose two types of contrastive learning paradigms for finding the most relevant user's interactive intention, and maximizing the mutual information of positive sample pairs.
arXiv Detail & Related papers (2024-04-28T15:13:36Z) - Enhancing Robotic Manipulation with AI Feedback from Multimodal Large
Language Models [41.38520841504846]
Large language models (LLMs) can provide automated preference feedback solely from image inputs to guide decision-making.
In this study, we train a multimodal LLM, termed CriticGPT, capable of understanding trajectory videos in robot manipulation tasks.
Experimental evaluation of the algorithm's preference accuracy demonstrates its effective generalization ability to new tasks.
Performance on Meta-World tasks reveals that CriticGPT's reward model efficiently guides policy learning, surpassing rewards based on state-of-the-art pre-trained representation models.
arXiv Detail & Related papers (2024-02-22T03:14:03Z) - Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts.
RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z) - Sample Efficient Preference Alignment in LLMs via Active Exploration [63.84454768573154]
We take advantage of the fact that one can often choose contexts at which to obtain human feedback to most efficiently identify a good policy.
We propose an active exploration algorithm to efficiently select the data and provide theoretical proof that it has a worst-case regret bound.
Our method outperforms the baselines with limited samples of human preferences on several language models and four real-world datasets.
arXiv Detail & Related papers (2023-12-01T00:54:02Z) - Multi-Objective Bayesian Optimization with Active Preference Learning [18.066263838953223]
We propose a Bayesian optimization (BO) approach to identifying the most preferred solution in a multi-objective optimization (MOO) problem.
To minimize the interaction cost with the decision maker (DM), we also propose an active learning strategy for the preference estimation.
arXiv Detail & Related papers (2023-11-22T15:24:36Z) - Visualizing the Relationship Between Encoded Linguistic Information and
Task Performance [53.223789395577796]
We study the dynamic relationship between the encoded linguistic information and task performance from the viewpoint of Pareto Optimality.
We conduct experiments on two popular NLP tasks, i.e., machine translation and language modeling, and investigate the relationship between several kinds of linguistic information and task performances.
Our empirical findings suggest that some syntactic information is helpful for NLP tasks whereas encoding more syntactic information does not necessarily lead to better performance.
arXiv Detail & Related papers (2022-03-29T19:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.