Reward Learning from Multiple Feedback Types
- URL: http://arxiv.org/abs/2502.21038v1
- Date: Fri, 28 Feb 2025 13:29:54 GMT
- Title: Reward Learning from Multiple Feedback Types
- Authors: Yannick Metz, András Geiszl, Raphaël Baur, Mennatallah El-Assady,
- Abstract summary: We show that diverse types of feedback can be utilized and lead to strong reward modeling performance.<n>This work is the first strong indicator of the potential of multi-type feedback for RLHF.
- Score: 7.910064218813772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning rewards from preference feedback has become an important tool in the alignment of agentic models. Preference-based feedback, often implemented as a binary comparison between multiple completions, is an established method to acquire large-scale human feedback. However, human feedback in other contexts is often much more diverse. Such diverse feedback can better support the goals of a human annotator, and the simultaneous use of multiple sources might be mutually informative for the learning process or carry type-dependent biases for the reward learning process. Despite these potential benefits, learning from different feedback types has yet to be explored extensively. In this paper, we bridge this gap by enabling experimentation and evaluating multi-type feedback in a broad set of environments. We present a process to generate high-quality simulated feedback of six different types. Then, we implement reward models and downstream RL training for all six feedback types. Based on the simulated feedback, we investigate the use of types of feedback across ten RL environments and compare them to pure preference-based baselines. We show empirically that diverse types of feedback can be utilized and lead to strong reward modeling performance. This work is the first strong indicator of the potential of multi-type feedback for RLHF.
Related papers
- Reinforcement Learning from Multi-level and Episodic Human Feedback [1.9686770963118378]
We propose an algorithm to efficiently learn both the reward function and the optimal policy from multi-level human feedback.
We show that the proposed algorithm achieves sublinear regret and demonstrate its empirical effectiveness through extensive simulations.
arXiv Detail & Related papers (2025-04-20T20:09:19Z) - Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation [67.88747330066049]
Fine-grained feedback captures nuanced distinctions in image quality and prompt-alignment.
We show that demonstrating its superiority to coarse-grained feedback is not automatic.
We identify key challenges in eliciting and utilizing fine-grained feedback.
arXiv Detail & Related papers (2024-06-24T17:19:34Z) - Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback [43.51441849047147]
We introduce Uni-RLHF, a comprehensive system implementation tailored for RLHF.
Uni-RLHF contains three packages: 1) a universal multi-feedback annotation platform, 2) large-scale crowdsourced feedback datasets, and 3) modular offline RLHF baseline implementations.
arXiv Detail & Related papers (2024-02-04T09:40:22Z) - Sample Efficient Reinforcement Learning from Human Feedback via Active
Exploration [29.935758027209292]
Preference-based feedback is important for many applications in reinforcement learning.
In this work, we take advantage of the fact that one can often choose contexts to obtain human feedback.
We show that our method is able to reach better performance with fewer samples of human preferences than multiple baselines.
arXiv Detail & Related papers (2023-12-01T00:54:02Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - RLHF-Blender: A Configurable Interactive Interface for Learning from
Diverse Human Feedback [9.407901608317895]
We propose RLHF-Blender, an interactive interface for learning from human feedback.
RLHF-Blender provides a modular experimentation framework that enables researchers to investigate the properties and qualities of human feedback.
We discuss a set of concrete research opportunities enabled by RLHF-Blender.
arXiv Detail & Related papers (2023-08-08T15:21:30Z) - Provable Benefits of Policy Learning from Human Preferences in
Contextual Bandit Problems [82.92678837778358]
preference-based methods have demonstrated substantial success in empirical applications such as InstructGPT.
We show how human bias and uncertainty in feedback modelings can affect the theoretical guarantees of these approaches.
arXiv Detail & Related papers (2023-07-24T17:50:24Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - The Effect of Modeling Human Rationality Level on Learning Rewards from
Multiple Feedback Types [38.37216644899506]
We argue that grounding the rationality coefficient in real data for each feedback type has a significant positive effect on reward learning.
We find that when learning from a single feedback type, overestimating human rationality can have dire effects on reward accuracy and regret.
arXiv Detail & Related papers (2022-08-23T02:19:10Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible.
In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types.
We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z) - Dialogue Response Ranking Training with Large-Scale Human Feedback Data [52.12342165926226]
We leverage social media feedback data to build a large-scale training dataset for feedback prediction.
We trained DialogRPT, a set of GPT-2 based models on 133M pairs of human feedback data.
Our ranker outperforms the conventional dialog perplexity baseline with a large margin on predicting Reddit feedback.
arXiv Detail & Related papers (2020-09-15T10:50:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.