Active Imitation Learning from Multiple Non-Deterministic Teachers:
Formulation, Challenges, and Algorithms
- URL: http://arxiv.org/abs/2006.07777v1
- Date: Sun, 14 Jun 2020 03:06:27 GMT
- Title: Active Imitation Learning from Multiple Non-Deterministic Teachers:
Formulation, Challenges, and Algorithms
- Authors: Khanh Nguyen and Hal Daum\'e III
- Abstract summary: We formulate the problem of learning to imitate multiple, non-deterministic teachers with minimal interaction cost.
We first present a general framework that efficiently models and estimates such a distribution by learning continuous representations of the teacher policies.
Next, we develop Active Performance-Based Imitation Learning (APIL), an active learning algorithm for reducing the learner-teacher interaction cost.
- Score: 3.6702509833426613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We formulate the problem of learning to imitate multiple, non-deterministic
teachers with minimal interaction cost. Rather than learning a specific policy
as in standard imitation learning, the goal in this problem is to learn a
distribution over a policy space. We first present a general framework that
efficiently models and estimates such a distribution by learning continuous
representations of the teacher policies. Next, we develop Active
Performance-Based Imitation Learning (APIL), an active learning algorithm for
reducing the learner-teacher interaction cost in this framework. By making
query decisions based on predictions of future progress, our algorithm avoids
the pitfalls of traditional uncertainty-based approaches in the face of teacher
behavioral uncertainty. Results on both toy and photo-realistic navigation
tasks show that APIL significantly reduces the numbers of interactions with
teachers without compromising on performance. Moreover, it is robust to various
degrees of teacher behavioral uncertainty.
Related papers
- Active teacher selection for reinforcement learning from human feedback [14.009227941725783]
Reinforcement learning from human feedback (RLHF) enables machine learning systems to learn objectives from human feedback.
We propose the Hidden Utility Bandit framework to model differences in teacher rationality, expertise, and costliness.
We develop a variety of solution algorithms and apply them to two real-world domains: paper recommendation systems and COVID-19 vaccine testing.
arXiv Detail & Related papers (2023-10-23T18:54:43Z) - Interactively Teaching an Inverse Reinforcement Learner with Limited
Feedback [4.174296652683762]
We study the problem of teaching via demonstrations in sequential decision-making tasks.
In this work, we formalize the teaching process with limited feedback and propose an algorithm that solves this problem.
arXiv Detail & Related papers (2023-09-16T21:12:04Z) - TGRL: An Algorithm for Teacher Guided Reinforcement Learning [45.38447023752256]
It is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives.
We present a $textitprincipled$ approach, along with an approximate implementation for $textitdynamically$ and $textitautomatically$ balancing when to follow the teacher and when to use rewards.
arXiv Detail & Related papers (2023-07-06T17:58:40Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge
Distillation [70.92135839545314]
We propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation.
Our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.
arXiv Detail & Related papers (2022-06-13T11:52:13Z) - Contrastive Continual Learning with Feature Propagation [32.70482982044965]
Continual machine learners are elaborated to commendably learn a stream of tasks with domain and class shifts among different tasks.
We propose a general feature-propagation based contrastive continual learning method which is capable of handling multiple continual learning scenarios.
arXiv Detail & Related papers (2021-12-03T04:55:28Z) - Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency.
We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Dual Policy Distillation [58.43610940026261]
Policy distillation, which transfers a teacher policy to a student policy, has achieved great success in challenging tasks of deep reinforcement learning.
In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment.
The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-06-07T06:49:47Z) - Neural Multi-Task Learning for Teacher Question Detection in Online
Classrooms [50.19997675066203]
We build an end-to-end neural framework that automatically detects questions from teachers' audio recordings.
By incorporating multi-task learning techniques, we are able to strengthen the understanding of semantic relations among different types of questions.
arXiv Detail & Related papers (2020-05-16T02:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.