Related papers: Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback

Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback

URL: http://arxiv.org/abs/2402.02423v2
Date: Mon, 25 Mar 2024 13:20:46 GMT
Title: Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback
Authors: Yifu Yuan, Jianye Hao, Yi Ma, Zibin Dong, Hebin Liang, Jinyi Liu, Zhixin Feng, Kai Zhao, Yan Zheng,
Abstract summary: We introduce Uni-RLHF, a comprehensive system implementation tailored for RLHF. Uni-RLHF contains three packages: 1) a universal multi-feedback annotation platform, 2) large-scale crowdsourced feedback datasets, and 3) modular offline RLHF baseline implementations.
Score: 43.51441849047147
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement Learning with Human Feedback (RLHF) has received significant attention for performing tasks without the need for costly manual reward design by aligning human preferences. It is crucial to consider diverse human feedback types and various learning methods in different environments. However, quantifying progress in RLHF with diverse feedback is challenging due to the lack of standardized annotation platforms and widely used unified benchmarks. To bridge this gap, we introduce Uni-RLHF, a comprehensive system implementation tailored for RLHF. It aims to provide a complete workflow from real human feedback, fostering progress in the development of practical problems. Uni-RLHF contains three packages: 1) a universal multi-feedback annotation platform, 2) large-scale crowdsourced feedback datasets, and 3) modular offline RLHF baseline implementations. Uni-RLHF develops a user-friendly annotation interface tailored to various feedback types, compatible with a wide range of mainstream RL environments. We then establish a systematic pipeline of crowdsourced annotations, resulting in large-scale annotated datasets comprising more than 15 million steps across 30+ popular tasks. Through extensive experiments, the results in the collected datasets demonstrate competitive performance compared to those from well-designed manual rewards. We evaluate various design choices and offer insights into their strengths and potential areas of improvement. We wish to build valuable open-source platforms, datasets, and baselines to facilitate the development of more robust and reliable RLHF solutions based on realistic human feedback. The website is available at https://uni-rlhf.github.io/.

Related papers

Reward Learning from Multiple Feedback Types [7.910064218813772]
We show that diverse types of feedback can be utilized and lead to strong reward modeling performance. This work is the first strong indicator of the potential of multi-type feedback for RLHF.
arXiv Detail & Related papers (2025-02-28T13:29:54Z)
Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits [59.30310692855397]
We propose a unified framework for the RLHF pipeline from the view of contextual bandits. We decompose the RLHF process into two distinct stages: (post-)training and deployment. We then develop novel algorithms for each stage, demonstrating significant improvements in both statistical and computational efficiency.
arXiv Detail & Related papers (2025-02-11T02:36:01Z)
Curiosity-Driven Reinforcement Learning from Human Feedback [56.45486828254951]
Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models with human preferences, but often at the cost of reduced output diversity. We introduce curiosity-driven RLHF (CD-RLHF), a framework that incorporates intrinsic rewards for novel states, alongside traditional sparse extrinsic rewards. We demonstrate the effectiveness of CD-RLHF through extensive experiments on a range of tasks, including text summarization and instruction following.
arXiv Detail & Related papers (2025-01-20T12:51:40Z)
RLHF Workflow: From Reward Modeling to Online RLHF [79.83927049253924]
We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report. RLHF is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature. We show that supervised fine-tuning (SFT) and iterative RLHF can obtain state-of-the-art performance with fully open-source datasets.
arXiv Detail & Related papers (2024-05-13T15:50:39Z)
Provable Multi-Party Reinforcement Learning with Diverse Human Feedback [63.830731470186855]
Reinforcement learning with human feedback (RLHF) is an emerging paradigm to align models with human preferences. We show how traditional RLHF approaches can fail since learning a single reward function cannot capture and balance the preferences of multiple individuals. We incorporate meta-learning to learn multiple preferences and adopt different social welfare functions to aggregate the preferences across multiple parties.
arXiv Detail & Related papers (2024-03-08T03:05:11Z)
Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration [29.935758027209292]
Preference-based feedback is important for many applications in reinforcement learning. In this work, we take advantage of the fact that one can often choose contexts to obtain human feedback. We show that our method is able to reach better performance with fewer samples of human preferences than multiple baselines.
arXiv Detail & Related papers (2023-12-01T00:54:02Z)
A Long Way to Go: Investigating Length Correlations in RLHF [59.49656695716066]
This paper demonstrates, on three diverse settings, that optimizing for response length is a significant factor behind RLHF. We find improvements in reward to largely be driven by increasing response length, instead of other features. Even a purely length-based reward reproduces most downstream RLHF improvements over supervised fine-tuned models.
arXiv Detail & Related papers (2023-10-05T17:38:28Z)
UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z)
RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback [9.407901608317895]
We propose RLHF-Blender, an interactive interface for learning from human feedback. RLHF-Blender provides a modular experimentation framework that enables researchers to investigate the properties and qualities of human feedback. We discuss a set of concrete research opportunities enabled by RLHF-Blender.
arXiv Detail & Related papers (2023-08-08T15:21:30Z)
RRHF: Rank Responses to Align Language Models with Human Feedback without tears [69.68672043223249]
InstructGPT implements RLHF through several stages, including Supervised Fine-Tuning (SFT), reward model training, and Proximal Policy Optimization (PPO) We propose a novel learning paradigm called RRHF, which scores sampled responses from different sources via a logarithm of conditional probabilities. We evaluate RRHF on the Helpful and Harmless dataset, demonstrating comparable alignment performance with PPO by reward model score and human labeling.
arXiv Detail & Related papers (2023-04-11T15:53:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.