Optimal Design for Human Feedback
- URL: http://arxiv.org/abs/2404.13895v2
- Date: Fri, 31 May 2024 02:04:44 GMT
- Title: Optimal Design for Human Feedback
- Authors: Subhojyoti Mukherjee, Anusha Lalitha, Kousha Kalantari, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton,
- Abstract summary: We study the problem of data collection for learning preference models.
To show the generality of our ideas, we study both absolute and relative feedback on the lists.
We design efficient algorithms for both settings and analyze them.
- Score: 17.520528548509944
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning of preference models from human feedback has been central to recent advances in artificial intelligence. Motivated by the cost of obtaining high-quality human annotations, we study the problem of data collection for learning preference models. The key idea in our work is to generalize the optimal design, a method for computing information gathering policies, to ranked lists. To show the generality of our ideas, we study both absolute and relative feedback on the lists. We design efficient algorithms for both settings and analyze them. We prove that our preference model estimators improve with more data and so does the ranking error under the estimators. Finally, we experiment with several synthetic and real-world datasets to show the statistical efficiency of our algorithms.
Related papers
- Active Learning for Direct Preference Optimization [59.84525302418018]
Direct preference optimization (DPO) is a form of reinforcement learning from human feedback.
We propose an active learning framework for DPO, which can be applied to collect human feedback online or to choose the most informative subset of already collected feedback offline.
arXiv Detail & Related papers (2025-03-03T00:36:31Z) - Preference Optimization as Probabilistic Inference [21.95277469346728]
We propose a method that can leverage unpaired preferred or dis-preferred examples, and works even when only one type of feedback is available.
This flexibility allows us to apply it in scenarios with varying forms of feedback and models, including training generative language models.
arXiv Detail & Related papers (2024-10-05T14:04:03Z) - Data-Centric Human Preference Optimization with Rationales [23.243583332894737]
Reinforcement learning from human feedback plays a crucial role in aligning language models towards human preferences.
This work shifts focus to improving preference learning through a data-centric approach.
We propose enriching existing preference datasets with machine-generated rationales that explain the reasons behind choices.
arXiv Detail & Related papers (2024-07-19T17:27:52Z) - Aligning Large Language Models from Self-Reference AI Feedback with one General Principle [61.105703857868775]
We propose a self-reference-based AI feedback framework that enables a 13B Llama2-Chat to provide high-quality feedback.
Specifically, we allow the AI to first respond to the user's instructions, then generate criticism of other answers based on its own response as a reference.
Finally, we determine which answer better fits human preferences according to the criticism.
arXiv Detail & Related papers (2024-06-17T03:51:46Z) - Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback [110.16220825629749]
Learning from preference feedback has emerged as an essential step for improving the generation quality and performance of modern language models.
In this work, we identify four core aspects of preference-based learning: preference data, learning algorithm, reward model, and policy training prompts.
Our findings indicate that all aspects are important for performance, with better preference data leading to the largest improvements.
arXiv Detail & Related papers (2024-06-13T16:17:21Z) - Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input [17.131441665935128]
We study how to extract fine-grained data regarding why an example is preferred that is useful for learning more accurate reward models.
Our findings suggest that incorporating pragmatic feature preferences is a promising approach for more efficient user-aligned reward learning.
arXiv Detail & Related papers (2024-05-23T16:36:16Z) - Personalized Language Modeling from Personalized Human Feedback [49.344833339240566]
Reinforcement Learning from Human Feedback (RLHF) is commonly used to fine-tune large language models to better align with human preferences.
In this work, we aim to address this problem by developing methods for building personalized language models.
arXiv Detail & Related papers (2024-02-06T04:18:58Z) - Sample Efficient Preference Alignment in LLMs via Active Exploration [63.84454768573154]
We take advantage of the fact that one can often choose contexts at which to obtain human feedback to most efficiently identify a good policy.
We propose an active exploration algorithm to efficiently select the data and provide theoretical proof that it has a worst-case regret bound.
Our method outperforms the baselines with limited samples of human preferences on several language models and four real-world datasets.
arXiv Detail & Related papers (2023-12-01T00:54:02Z) - Contrastive Preference Learning: Learning from Human Feedback without RL [71.77024922527642]
We introduce Contrastive Preference Learning (CPL), an algorithm for learning optimal policies from preferences without learning reward functions.
CPL is fully off-policy, uses only a simple contrastive objective, and can be applied to arbitrary MDPs.
arXiv Detail & Related papers (2023-10-20T16:37:56Z) - Provable Benefits of Policy Learning from Human Preferences in
Contextual Bandit Problems [82.92678837778358]
preference-based methods have demonstrated substantial success in empirical applications such as InstructGPT.
We show how human bias and uncertainty in feedback modelings can affect the theoretical guarantees of these approaches.
arXiv Detail & Related papers (2023-07-24T17:50:24Z) - Direct Preference-based Policy Optimization without Reward Modeling [25.230992130108767]
Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference.
We propose a PbRL algorithm that directly learns from preference without requiring any reward modeling.
We show that our algorithm surpasses offline RL methods that learn with ground-truth reward information.
arXiv Detail & Related papers (2023-01-30T12:51:13Z) - Models of human preference for learning reward functions [80.39289349661364]
We learn the reward function from human-generated preferences between pairs of trajectory segments.
We find this assumption to be flawed and propose modeling human preferences as informed by each segment's regret.
Our proposed regret preference model better predicts real human preferences and also learns reward functions from these preferences that lead to policies that are better human-aligned.
arXiv Detail & Related papers (2022-06-05T17:58:02Z) - A General Language Assistant as a Laboratory for Alignment [3.3598752405752106]
We study simple baseline techniques and evaluations, such as prompting.
We find that the benefits from modest interventions increase with model size, generalize to a variety of alignment evaluations, and do not compromise the performance of large models.
We study a preference model pre-training' stage of training, with the goal of improving sample efficiency when finetuning on human preferences.
arXiv Detail & Related papers (2021-12-01T22:24:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.