Related papers: Active Imitation Learning from Multiple Non-Deterministic Teachers: Formulation, Challenges, and Algorithms

Active Imitation Learning from Multiple Non-Deterministic Teachers: Formulation, Challenges, and Algorithms

URL: http://arxiv.org/abs/2006.07777v1
Date: Sun, 14 Jun 2020 03:06:27 GMT
Title: Active Imitation Learning from Multiple Non-Deterministic Teachers: Formulation, Challenges, and Algorithms
Authors: Khanh Nguyen and Hal Daum\'e III
Abstract summary: We formulate the problem of learning to imitate multiple, non-deterministic teachers with minimal interaction cost. We first present a general framework that efficiently models and estimates such a distribution by learning continuous representations of the teacher policies. Next, we develop Active Performance-Based Imitation Learning (APIL), an active learning algorithm for reducing the learner-teacher interaction cost.
Score: 3.6702509833426613
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We formulate the problem of learning to imitate multiple, non-deterministic teachers with minimal interaction cost. Rather than learning a specific policy as in standard imitation learning, the goal in this problem is to learn a distribution over a policy space. We first present a general framework that efficiently models and estimates such a distribution by learning continuous representations of the teacher policies. Next, we develop Active Performance-Based Imitation Learning (APIL), an active learning algorithm for reducing the learner-teacher interaction cost in this framework. By making query decisions based on predictions of future progress, our algorithm avoids the pitfalls of traditional uncertainty-based approaches in the face of teacher behavioral uncertainty. Results on both toy and photo-realistic navigation tasks show that APIL significantly reduces the numbers of interactions with teachers without compromising on performance. Moreover, it is robust to various degrees of teacher behavioral uncertainty.

Related papers

Knowledge Distillation with Training Wheels [15.153745235245287]
We formulate a more general framework for knowledge distillation where the student learns from the teacher during training. We extend this using constrained reinforcement learning to a framework that incorporates the use of the teacher model as a test-time reference.
arXiv Detail & Related papers (2025-02-24T23:17:52Z)
Active teacher selection for reinforcement learning from human feedback [14.009227941725783]
Reinforcement learning from human feedback (RLHF) enables machine learning systems to learn objectives from human feedback. We propose the Hidden Utility Bandit framework to model differences in teacher rationality, expertise, and costliness. We develop a variety of solution algorithms and apply them to two real-world domains: paper recommendation systems and COVID-19 vaccine testing.
arXiv Detail & Related papers (2023-10-23T18:54:43Z)
Interactively Teaching an Inverse Reinforcement Learner with Limited Feedback [4.174296652683762]
We study the problem of teaching via demonstrations in sequential decision-making tasks. In this work, we formalize the teaching process with limited feedback and propose an algorithm that solves this problem.
arXiv Detail & Related papers (2023-09-16T21:12:04Z)
TGRL: An Algorithm for Teacher Guided Reinforcement Learning [45.38447023752256]
It is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. We present a $textitprincipled$ approach, along with an approximate implementation for $textitdynamically$ and $textitautomatically$ balancing when to follow the teacher and when to use rewards.
arXiv Detail & Related papers (2023-07-06T17:58:40Z)
Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models. In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks. Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z)
Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation [70.92135839545314]
We propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation. Our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.
arXiv Detail & Related papers (2022-06-13T11:52:13Z)
Contrastive Continual Learning with Feature Propagation [32.70482982044965]
Continual machine learners are elaborated to commendably learn a stream of tasks with domain and class shifts among different tasks. We propose a general feature-propagation based contrastive continual learning method which is capable of handling multiple continual learning scenarios.
arXiv Detail & Related papers (2021-12-03T04:55:28Z)
Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency. We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z)
Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning. The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior. Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z)
Dual Policy Distillation [58.43610940026261]
Policy distillation, which transfers a teacher policy to a student policy, has achieved great success in challenging tasks of deep reinforcement learning. In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment. The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-06-07T06:49:47Z)
Neural Multi-Task Learning for Teacher Question Detection in Online Classrooms [50.19997675066203]
We build an end-to-end neural framework that automatically detects questions from teachers' audio recordings. By incorporating multi-task learning techniques, we are able to strengthen the understanding of semantic relations among different types of questions.
arXiv Detail & Related papers (2020-05-16T02:17:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.