Related papers: RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback

RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback

URL: http://arxiv.org/abs/2308.04332v1
Date: Tue, 8 Aug 2023 15:21:30 GMT
Title: RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback
Authors: Yannick Metz, David Lindner, Rapha\"el Baur, Daniel Keim, Mennatallah El-Assady
Abstract summary: We propose RLHF-Blender, an interactive interface for learning from human feedback. RLHF-Blender provides a modular experimentation framework that enables researchers to investigate the properties and qualities of human feedback. We discuss a set of concrete research opportunities enabled by RLHF-Blender.
Score: 9.407901608317895
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To use reinforcement learning from human feedback (RLHF) in practical applications, it is crucial to learn reward models from diverse sources of human feedback and to consider human factors involved in providing feedback of different types. However, the systematic study of learning from diverse types of feedback is held back by limited standardized tooling available to researchers. To bridge this gap, we propose RLHF-Blender, a configurable, interactive interface for learning from human feedback. RLHF-Blender provides a modular experimentation framework and implementation that enables researchers to systematically investigate the properties and qualities of human feedback for reward learning. The system facilitates the exploration of various feedback types, including demonstrations, rankings, comparisons, and natural language instructions, as well as studies considering the impact of human factors on their effectiveness. We discuss a set of concrete research opportunities enabled by RLHF-Blender. More information is available at https://rlhfblender.info/.

Related papers

Reward Learning from Multiple Feedback Types [7.910064218813772]
We show that diverse types of feedback can be utilized and lead to strong reward modeling performance. This work is the first strong indicator of the potential of multi-type feedback for RLHF.
arXiv Detail & Related papers (2025-02-28T13:29:54Z)
Curiosity-Driven Reinforcement Learning from Human Feedback [56.45486828254951]
Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models with human preferences, but often at the cost of reduced output diversity. We introduce curiosity-driven RLHF (CD-RLHF), a framework that incorporates intrinsic rewards for novel states, alongside traditional sparse extrinsic rewards. We demonstrate the effectiveness of CD-RLHF through extensive experiments on a range of tasks, including text summarization and instruction following.
arXiv Detail & Related papers (2025-01-20T12:51:40Z)
Understanding Impact of Human Feedback via Influence Functions [25.467337374024197]
In Reinforcement Learning from Human Feedback (RLHF), it is crucial to learn suitable reward models from human feedback. Human feedback can often be noisy, inconsistent, or biased, especially when evaluating complex responses. We propose a compute-efficient approximation method to measure the impact of human feedback on the performance of reward models.
arXiv Detail & Related papers (2025-01-10T08:50:38Z)
Mapping out the Space of Human Feedback for Reinforcement Learning: A Conceptual Framework [13.949126295663328]
We bridge the gap between machine learning and human-computer interaction efforts by developing a shared understanding of human feedback in interactive learning scenarios. We introduce a taxonomy of feedback types for reward-based learning from human feedback based on nine key dimensions. We identify seven quality metrics of human feedback influencing both the human ability to express feedback and the agent's ability to learn from the feedback.
arXiv Detail & Related papers (2024-11-18T17:40:42Z)
Self-Evolved Reward Learning for LLMs [45.6910747154447]
Reinforcement Learning from Human Feedback (RLHF) is a crucial technique for aligning language models with human preferences. We propose Self-Evolved Reward Learning (SER), a novel approach where the RM generates additional training data to iteratively improve itself. Our results demonstrate that even with limited human-annotated data, learning from self-feedback can robustly enhance RM performance.
arXiv Detail & Related papers (2024-11-01T07:29:03Z)
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning [12.742158403867002]
Reinforcement Learning from Human Feedback is a powerful paradigm for aligning foundation models to human values and preferences. Current RLHF techniques cannot account for the naturally occurring differences in individual human preferences across a diverse population. We develop a class of multimodal RLHF methods to address the need for pluralistic alignment.
arXiv Detail & Related papers (2024-08-19T15:18:30Z)
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs [49.386699863989335]
Training large language models (LLMs) to serve as effective assistants for humans requires careful consideration. A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences. In this paper, we analyze RLHF through the lens of reinforcement learning principles to develop an understanding of its fundamentals.
arXiv Detail & Related papers (2024-04-12T15:54:15Z)
Provable Multi-Party Reinforcement Learning with Diverse Human Feedback [63.830731470186855]
Reinforcement learning with human feedback (RLHF) is an emerging paradigm to align models with human preferences. We show how traditional RLHF approaches can fail since learning a single reward function cannot capture and balance the preferences of multiple individuals. We incorporate meta-learning to learn multiple preferences and adopt different social welfare functions to aggregate the preferences across multiple parties.
arXiv Detail & Related papers (2024-03-08T03:05:11Z)
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback [43.51441849047147]
We introduce Uni-RLHF, a comprehensive system implementation tailored for RLHF. Uni-RLHF contains three packages: 1) a universal multi-feedback annotation platform, 2) large-scale crowdsourced feedback datasets, and 3) modular offline RLHF baseline implementations.
arXiv Detail & Related papers (2024-02-04T09:40:22Z)
Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration [29.935758027209292]
Preference-based feedback is important for many applications in reinforcement learning. In this work, we take advantage of the fact that one can often choose contexts to obtain human feedback. We show that our method is able to reach better performance with fewer samples of human preferences than multiple baselines.
arXiv Detail & Related papers (2023-12-01T00:54:02Z)
UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z)
RRHF: Rank Responses to Align Language Models with Human Feedback without tears [69.68672043223249]
InstructGPT implements RLHF through several stages, including Supervised Fine-Tuning (SFT), reward model training, and Proximal Policy Optimization (PPO) We propose a novel learning paradigm called RRHF, which scores sampled responses from different sources via a logarithm of conditional probabilities. We evaluate RRHF on the Helpful and Harmless dataset, demonstrating comparable alignment performance with PPO by reward model score and human labeling.
arXiv Detail & Related papers (2023-04-11T15:53:40Z)
Accelerating exploration and representation learning with offline pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.