Active Reward Learning from Multiple Teachers
- URL: http://arxiv.org/abs/2303.00894v1
- Date: Thu, 2 Mar 2023 01:26:53 GMT
- Title: Active Reward Learning from Multiple Teachers
- Authors: Peter Barnett, Rachel Freedman, Justin Svegliato, Stuart Russell
- Abstract summary: Reward learning algorithms utilize human feedback to infer a reward function, which is then used to train an AI system.
This human feedback is often a preference comparison, in which the human teacher compares several samples of AI behavior and chooses which they believe best accomplishes the objective.
While reward learning typically assumes that all feedback comes from a single teacher, in practice these systems often query multiple teachers to gather sufficient training data.
- Score: 17.10187575303075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reward learning algorithms utilize human feedback to infer a reward function,
which is then used to train an AI system. This human feedback is often a
preference comparison, in which the human teacher compares several samples of
AI behavior and chooses which they believe best accomplishes the objective.
While reward learning typically assumes that all feedback comes from a single
teacher, in practice these systems often query multiple teachers to gather
sufficient training data. In this paper, we investigate this disparity, and
find that algorithmic evaluation of these different sources of feedback
facilitates more accurate and efficient reward learning. We formally analyze
the value of information (VOI) when reward learning from teachers with varying
levels of rationality, and define and evaluate an algorithm that utilizes this
VOI to actively select teachers to query for feedback. Surprisingly, we find
that it is often more informative to query comparatively irrational teachers.
By formalizing this problem and deriving an analytical solution, we hope to
facilitate improvement in reward learning approaches to aligning AI behavior
with human values.
Related papers
- Dual Active Learning for Reinforcement Learning from Human Feedback [13.732678966515781]
Reinforcement learning from human feedback (RLHF) is widely applied to align large language models with human preferences.
Human feedback is costly and time-consuming, making it essential to collect high-quality conversation data for human teachers to label.
In this paper, we use offline reinforcement learning (RL) to formulate the alignment problem.
arXiv Detail & Related papers (2024-10-03T14:09:58Z) - CANDERE-COACH: Reinforcement Learning from Noisy Feedback [12.232688822099325]
The CANDERE-COACH algorithm is capable of learning from noisy feedback by a nonoptimal teacher.
We propose a noise-filtering mechanism to de-noise online feedback data, thereby enabling the RL agent to successfully learn with up to 40% of the teacher feedback being incorrect.
arXiv Detail & Related papers (2024-09-23T20:14:12Z) - Improving the Validity of Automatically Generated Feedback via
Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)
Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z) - YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework.
It emulates the teacher-student education process to improve the efficacy of model fine-tuning.
Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z) - Active teacher selection for reinforcement learning from human feedback [14.009227941725783]
Reinforcement learning from human feedback (RLHF) enables machine learning systems to learn objectives from human feedback.
We propose the Hidden Utility Bandit framework to model differences in teacher rationality, expertise, and costliness.
We develop a variety of solution algorithms and apply them to two real-world domains: paper recommendation systems and COVID-19 vaccine testing.
arXiv Detail & Related papers (2023-10-23T18:54:43Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency.
We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Learning Online from Corrective Feedback: A Meta-Algorithm for Robotics [24.863665993509997]
A key challenge in Imitation Learning (IL) is that optimal state actions demonstrations are difficult for the teacher to provide.
As an alternative to state action demonstrations, the teacher can provide corrective feedback such as their preferences or rewards.
We show that our approach can learn quickly from a variety of noisy feedback.
arXiv Detail & Related papers (2021-04-02T12:42:12Z) - Using Machine Teaching to Investigate Human Assumptions when Teaching
Reinforcement Learners [26.006964607579004]
We focus on a common reinforcement learning method, Q-learning, and examine what assumptions people have using a behavioral experiment.
We use a deep learning approximation method which simulates learners in the environment and learns to predict how feedback affects the learner's internal states.
Our results reveal how people teach using evaluative feedback and provide guidance for how engineers should design machine agents in a manner that is intuitive for people.
arXiv Detail & Related papers (2020-09-05T06:32:38Z) - Explainable Active Learning (XAL): An Empirical Study of How Local
Explanations Impact Annotator Experience [76.9910678786031]
We propose a novel paradigm of explainable active learning (XAL), by introducing techniques from the recently surging field of explainable AI (XAI) into an Active Learning setting.
Our study shows benefits of AI explanation as interfaces for machine teaching--supporting trust calibration and enabling rich forms of teaching feedback, and potential drawbacks--anchoring effect with the model judgment and cognitive workload.
arXiv Detail & Related papers (2020-01-24T22:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.