A Survey of Reinforcement Learning from Human Feedback
- URL: http://arxiv.org/abs/2312.14925v2
- Date: Tue, 30 Apr 2024 17:59:01 GMT
- Title: A Survey of Reinforcement Learning from Human Feedback
- Authors: Timo Kaufmann, Paul Weng, Viktor Bengs, Eyke Hüllermeier,
- Abstract summary: Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function.
This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between RL agents and human input.
- Score: 28.92654784501927
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of large language models (LLMs) has impressively demonstrated this potential in recent years, where RLHF played a decisive role in directing the model's capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between RL agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.
Related papers
- Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF [82.7679132059169]
Reinforcement learning from human feedback has emerged as a central tool for language model alignment.
We propose a new algorithm for online exploration in RLHF, Exploratory Preference Optimization (XPO)
XPO enjoys the strongest known provable guarantees and promising empirical performance.
arXiv Detail & Related papers (2024-05-31T17:39:06Z) - Multi-turn Reinforcement Learning from Preference Human Feedback [41.327438095745315]
Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models with human preferences.
Existing methods work by emulating the preferences at the single decision (turn) level.
We develop novel methods for Reinforcement Learning from preference feedback between two full multi-turn conversations.
arXiv Detail & Related papers (2024-05-23T14:53:54Z) - RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs [49.386699863989335]
Training large language models (LLMs) to serve as effective assistants for humans requires careful consideration.
A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences.
In this paper, we analyze RLHF through the lens of reinforcement learning principles to develop an understanding of its fundamentals.
arXiv Detail & Related papers (2024-04-12T15:54:15Z) - Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization [56.54271464134885]
We consider an RLHF algorithm based on policy optimization (PO-RLHF)
We provide performance bounds for PO-RLHF with low query complexity.
Key novelty is a trajectory-level elliptical potential analysis.
arXiv Detail & Related papers (2024-02-15T22:11:18Z) - The History and Risks of Reinforcement Learning and Human Feedback [0.16843915833103415]
Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models easier to use and more effective.
A core piece of the RLHF process is the training and utilization of a model of human preferences that acts as a reward function for optimization.
RLHF reward models are often cited as being central to achieving performance, yet very few descriptors of capabilities, evaluations, training methods, or open-source models exist.
arXiv Detail & Related papers (2023-10-20T15:45:16Z) - A Long Way to Go: Investigating Length Correlations in RLHF [59.49656695716066]
This paper demonstrates, on three diverse settings, that optimizing for response length is a significant factor behind RLHF.
We find improvements in reward to largely be driven by increasing response length, instead of other features.
Even a purely length-based reward reproduces most downstream RLHF improvements over supervised fine-tuned models.
arXiv Detail & Related papers (2023-10-05T17:38:28Z) - Secrets of RLHF in Large Language Models Part I: PPO [81.01936993929127]
Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence.
reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit.
In this report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training.
arXiv Detail & Related papers (2023-07-11T01:55:24Z) - Evolutionary Reinforcement Learning: A Survey [31.112066295496003]
Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments.
This article presents a comprehensive survey of state-of-the-art methods for integrating EC into RL, referred to as evolutionary reinforcement learning (EvoRL)
arXiv Detail & Related papers (2023-03-07T01:38:42Z) - Towards Interactive Reinforcement Learning with Intrinsic Feedback [1.7117805951258132]
Reinforcement learning (RL) and brain-computer interfaces (BCI) have experienced significant growth over the past decade.
With rising interest in human-in-the-loop (HITL), incorporating human input with RL algorithms has given rise to the sub-field of interactive RL.
We denote this new and emerging medium of feedback as intrinsic feedback.
arXiv Detail & Related papers (2021-12-02T19:29:26Z) - Towards Continual Reinforcement Learning: A Review and Perspectives [69.48324517535549]
We aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL)
While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners.
These include applications such as those in the fields of healthcare, education, logistics, and robotics.
arXiv Detail & Related papers (2020-12-25T02:35:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.