Pareto-Optimal Learning from Preferences with Hidden Context
- URL: http://arxiv.org/abs/2406.15599v1
- Date: Fri, 21 Jun 2024 18:57:38 GMT
- Title: Pareto-Optimal Learning from Preferences with Hidden Context
- Authors: Ryan Boldi, Li Ding, Lee Spector, Scott Niekum,
- Abstract summary: We propose POPL, which frames discrepant group preferences as objectives with potential trade-offs.
Our empirical evaluations demonstrate that POPL surpasses baseline methods in learning sets of reward functions.
POPL can serve as a foundation for techniques optimizing specific notions of group fairness.
- Score: 18.340302968130683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensuring AI models align with human values is essential for their safety and functionality. Reinforcement learning from human feedback (RLHF) uses human preferences to achieve this alignment. However, preferences sourced from diverse populations can result in point estimates of human values that may be sub-optimal or unfair to specific groups. We propose Pareto Optimal Preference Learning (POPL), which frames discrepant group preferences as objectives with potential trade-offs, aiming for policies that are Pareto-optimal on the preference dataset. POPL utilizes Lexicase selection, an iterative process to select diverse and Pareto-optimal solutions. Our empirical evaluations demonstrate that POPL surpasses baseline methods in learning sets of reward functions, effectively catering to distinct groups without access to group numbers or membership labels. Furthermore, we illustrate that POPL can serve as a foundation for techniques optimizing specific notions of group fairness, ensuring inclusive and equitable AI model alignment.
Related papers
- Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Group Robust Preference Optimization in Reward-free RLHF [23.622835830345725]
We propose a novel Group Robust Preference Optimization (GRPO) method to align large language models to individual groups' preferences robustly.
To achieve this, GRPO adaptively and sequentially weights the importance of different groups, prioritizing groups with worse cumulative loss.
We significantly improved performance for the worst-performing groups, reduced loss imbalances across groups, and improved probability accuracies.
arXiv Detail & Related papers (2024-05-30T17:50:04Z) - Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data [102.16105233826917]
Learning from preference labels plays a crucial role in fine-tuning large language models.
There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning.
arXiv Detail & Related papers (2024-04-22T17:20:18Z) - Controllable Preference Optimization: Toward Controllable
Multi-Objective Alignment [107.63756895544842]
Alignment in artificial intelligence pursues consistency between model responses and human preferences as well as values.
Existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives.
We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives.
arXiv Detail & Related papers (2024-02-29T12:12:30Z) - MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with
Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.
We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.
Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z) - Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts.
RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z) - Group Preference Optimization: Few-Shot Alignment of Large Language
Models [31.991620847943036]
Group Preference Optimization steers language models to preferences of individual groups in a few-shot manner.
We empirically validate the efficacy of GPO through rigorous evaluations using large language models with varied sizes.
Our results demonstrate that GPO not only aligns models more accurately but also requires fewer group-specific preferences, and less training and inference computing resources.
arXiv Detail & Related papers (2023-10-17T18:41:57Z) - Interactive Hyperparameter Optimization in Multi-Objective Problems via
Preference Learning [65.51668094117802]
We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML)
Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
arXiv Detail & Related papers (2023-09-07T09:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.