Mapping out the Space of Human Feedback for Reinforcement Learning: A Conceptual Framework
- URL: http://arxiv.org/abs/2411.11761v1
- Date: Mon, 18 Nov 2024 17:40:42 GMT
- Title: Mapping out the Space of Human Feedback for Reinforcement Learning: A Conceptual Framework
- Authors: Yannick Metz, David Lindner, Raphaƫl Baur, Mennatallah El-Assady,
- Abstract summary: We bridge the gap between machine learning and human-computer interaction efforts by developing a shared understanding of human feedback in interactive learning scenarios.
We introduce a taxonomy of feedback types for reward-based learning from human feedback based on nine key dimensions.
We identify seven quality metrics of human feedback influencing both the human ability to express feedback and the agent's ability to learn from the feedback.
- Score: 13.949126295663328
- License:
- Abstract: Reinforcement Learning from Human feedback (RLHF) has become a powerful tool to fine-tune or train agentic machine learning models. Similar to how humans interact in social contexts, we can use many types of feedback to communicate our preferences, intentions, and knowledge to an RL agent. However, applications of human feedback in RL are often limited in scope and disregard human factors. In this work, we bridge the gap between machine learning and human-computer interaction efforts by developing a shared understanding of human feedback in interactive learning scenarios. We first introduce a taxonomy of feedback types for reward-based learning from human feedback based on nine key dimensions. Our taxonomy allows for unifying human-centered, interface-centered, and model-centered aspects. In addition, we identify seven quality metrics of human feedback influencing both the human ability to express feedback and the agent's ability to learn from the feedback. Based on the feedback taxonomy and quality criteria, we derive requirements and design choices for systems learning from human feedback. We relate these requirements and design choices to existing work in interactive machine learning. In the process, we identify gaps in existing work and future research opportunities. We call for interdisciplinary collaboration to harness the full potential of reinforcement learning with data-driven co-adaptive modeling and varied interaction mechanics.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Real-time Addressee Estimation: Deployment of a Deep-Learning Model on
the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans.
Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot.
The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - RLHF-Blender: A Configurable Interactive Interface for Learning from
Diverse Human Feedback [9.407901608317895]
We propose RLHF-Blender, an interactive interface for learning from human feedback.
RLHF-Blender provides a modular experimentation framework that enables researchers to investigate the properties and qualities of human feedback.
We discuss a set of concrete research opportunities enabled by RLHF-Blender.
arXiv Detail & Related papers (2023-08-08T15:21:30Z) - Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
Language Generation [68.9440575276396]
This survey aims to provide an overview of the recent research that has leveraged human feedback to improve natural language generation.
First, we introduce an encompassing formalization of feedback, and identify and organize existing research into a taxonomy following this formalization.
Second, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using the feedback or training feedback models.
Third, we provide an overview of the nascent field of AI feedback, which exploits large language models to make judgments based on a set of principles and minimize the need for
arXiv Detail & Related papers (2023-05-01T17:36:06Z) - Humans are not Boltzmann Distributions: Challenges and Opportunities for
Modelling Human Feedback and Interaction in Reinforcement Learning [13.64577704565643]
We argue that these models are too simplistic and that RL researchers need to develop more realistic human models to design and evaluate their algorithms.
This paper calls for research from different disciplines to address key questions about how humans provide feedback to AIs and how we can build more robust human-in-the-loop RL systems.
arXiv Detail & Related papers (2022-06-27T13:58:51Z) - Towards Interactive Reinforcement Learning with Intrinsic Feedback [1.7117805951258132]
Reinforcement learning (RL) and brain-computer interfaces (BCI) have experienced significant growth over the past decade.
With rising interest in human-in-the-loop (HITL), incorporating human input with RL algorithms has given rise to the sub-field of interactive RL.
We denote this new and emerging medium of feedback as intrinsic feedback.
arXiv Detail & Related papers (2021-12-02T19:29:26Z) - Accelerating the Convergence of Human-in-the-Loop Reinforcement Learning
with Counterfactual Explanations [1.8275108630751844]
Human-in-the-loop Reinforcement Learning (HRL) addresses this issue by combining human feedback and reinforcement learning techniques.
We extend the existing TAMER Framework with the possibility to enhance human feedback with two different types of counterfactual explanations.
arXiv Detail & Related papers (2021-08-03T08:27:28Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z) - Widening the Pipeline in Human-Guided Reinforcement Learning with
Explanation and Context-Aware Data Augmentation [20.837228359591663]
We present the first study of using human visual explanations in human-in-the-loop reinforcement learning.
We propose EXPAND to encourage the model to encode task-relevant features through a context-aware data augmentation.
arXiv Detail & Related papers (2020-06-26T05:40:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.