Related papers: Active teacher selection for reinforcement learning from human feedback

Active teacher selection for reinforcement learning from human feedback

URL: http://arxiv.org/abs/2310.15288v1
Date: Mon, 23 Oct 2023 18:54:43 GMT
Title: Active teacher selection for reinforcement learning from human feedback
Authors: Rachel Freedman, Justin Svegliato, Kyle Wray, Stuart Russell
Abstract summary: Reinforcement learning from human feedback (RLHF) enables machine learning systems to learn objectives from human feedback. We propose the Hidden Utility Bandit framework to model differences in teacher rationality, expertise, and costliness. We develop a variety of solution algorithms and apply them to two real-world domains: paper recommendation systems and COVID-19 vaccine testing.
Score: 14.009227941725783
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning from human feedback (RLHF) enables machine learning systems to learn objectives from human feedback. A core limitation of these systems is their assumption that all feedback comes from a single human teacher, despite querying a range of distinct teachers. We propose the Hidden Utility Bandit (HUB) framework to model differences in teacher rationality, expertise, and costliness, formalizing the problem of learning from multiple teachers. We develop a variety of solution algorithms and apply them to two real-world domains: paper recommendation systems and COVID-19 vaccine testing. We find that the Active Teacher Selection (ATS) algorithm outperforms baseline algorithms by actively selecting when and which teacher to query. The HUB framework and ATS algorithm demonstrate the importance of leveraging differences between teachers to learn accurate reward models, facilitating future research on active teacher selection for robust reward modeling.

Related papers

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition [24.293448609592147]
Multi-Teacher Knowledge Distillation (KD) transfers diverse knowledge from a teacher pool to a student network. This paper proposes Multi-Teacher Knowledge Distillation with Reinforcement Learning (MTKD-RL) to optimize multi-teacher weights.
arXiv Detail & Related papers (2025-02-22T09:31:24Z)
YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework. It emulates the teacher-student education process to improve the efficacy of model fine-tuning. Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z)
PapagAI:Automated Feedback for Reflective Essays [48.4434976446053]
We present the first open-source automated feedback tool based on didactic theory and implemented as a hybrid AI system. The main objective of our work is to enable better learning outcomes for students and to complement the teaching activities of lecturers.
arXiv Detail & Related papers (2023-07-10T11:05:51Z)
TGRL: An Algorithm for Teacher Guided Reinforcement Learning [45.38447023752256]
It is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. We present a $textitprincipled$ approach, along with an approximate implementation for $textitdynamically$ and $textitautomatically$ balancing when to follow the teacher and when to use rewards.
arXiv Detail & Related papers (2023-07-06T17:58:40Z)
Active Reward Learning from Multiple Teachers [17.10187575303075]
Reward learning algorithms utilize human feedback to infer a reward function, which is then used to train an AI system. This human feedback is often a preference comparison, in which the human teacher compares several samples of AI behavior and chooses which they believe best accomplishes the objective. While reward learning typically assumes that all feedback comes from a single teacher, in practice these systems often query multiple teachers to gather sufficient training data.
arXiv Detail & Related papers (2023-03-02T01:26:53Z)
Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation [70.92135839545314]
We propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation. Our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.
arXiv Detail & Related papers (2022-06-13T11:52:13Z)
Unsupervised Domain Adaptive Person Re-Identification via Human Learning Imitation [67.52229938775294]
In past years, researchers propose to utilize the teacher-student framework in their methods to decrease the domain gap between different person re-identification datasets. Inspired by recent teacher-student framework based methods, we propose to conduct further exploration to imitate the human learning process from different aspects.
arXiv Detail & Related papers (2021-11-28T01:14:29Z)
A teacher-student framework for online correctional learning [12.980296933051509]
We show that the variance of the estimate of the student is reduced with the help of the teacher. We formulate the online problem - where the teacher has to decide at each time instant whether or not to change the observations. We validate the framework in numerical experiments, and compare the optimal online policy with the one from the batch setting.
arXiv Detail & Related papers (2021-11-15T15:01:00Z)
Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency. We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z)
Point Adversarial Self Mining: A Simple Method for Facial Expression Recognition [79.75964372862279]
We propose Point Adversarial Self Mining (PASM) to improve the recognition accuracy in facial expression recognition. PASM uses a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task. The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively.
arXiv Detail & Related papers (2020-08-26T06:39:24Z)
Neural Multi-Task Learning for Teacher Question Detection in Online Classrooms [50.19997675066203]
We build an end-to-end neural framework that automatically detects questions from teachers' audio recordings. By incorporating multi-task learning techniques, we are able to strengthen the understanding of semantic relations among different types of questions.
arXiv Detail & Related papers (2020-05-16T02:17:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.