Student-Initiated Action Advising via Advice Novelty
- URL: http://arxiv.org/abs/2010.00381v2
- Date: Sat, 27 Feb 2021 08:49:43 GMT
- Title: Student-Initiated Action Advising via Advice Novelty
- Authors: Ercument Ilhan, Jeremy Gow and Diego Perez-Liebana
- Abstract summary: Student-initiated techniques that utilise state novelty and uncertainty estimations have obtained promising results.
We propose a student-initiated algorithm that alleviates these by employing Random Network Distillation (RND) to measure the novelty of a piece of advice.
- Score: 0.14323566945483493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Action advising is a budget-constrained knowledge exchange mechanism between
teacher-student peers that can help tackle exploration and sample inefficiency
problems in deep reinforcement learning (RL). Most recently, student-initiated
techniques that utilise state novelty and uncertainty estimations have obtained
promising results. However, the approaches built on these estimations have some
potential weaknesses. First, they assume that the convergence of the student's
RL model implies less need for advice. This can be misleading in scenarios with
teacher absence early on where the student is likely to learn suboptimally by
itself; yet also ignore the teacher's assistance later. Secondly, the delays
between encountering states and having them to take effect in the RL model
updates in presence of the experience replay dynamics cause a feedback lag in
what the student actually needs advice for. We propose a student-initiated
algorithm that alleviates these by employing Random Network Distillation (RND)
to measure the novelty of a piece of advice. Furthermore, we perform RND
updates only for the advised states to ensure that the student's own learning
does not impair its ability to leverage the teacher. Experiments in GridWorld
and MinAtar show that our approach performs on par with the state-of-the-art
and demonstrates significant advantages in the scenarios where the existing
methods are prone to fail.
Related papers
- CANDERE-COACH: Reinforcement Learning from Noisy Feedback [12.232688822099325]
The CANDERE-COACH algorithm is capable of learning from noisy feedback by a nonoptimal teacher.
We propose a noise-filtering mechanism to de-noise online feedback data, thereby enabling the RL agent to successfully learn with up to 40% of the teacher feedback being incorrect.
arXiv Detail & Related papers (2024-09-23T20:14:12Z) - Faithful Knowledge Distillation [75.59907631395849]
We focus on two crucial questions with regard to a teacher-student pair: (i) do the teacher and student disagree at points close to correctly classified dataset examples, and (ii) is the distilled student as confident as the teacher around dataset examples?
These are critical questions when considering the deployment of a smaller student network trained from a robust teacher within a safety-critical setting.
arXiv Detail & Related papers (2023-06-07T13:41:55Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - Explainable Action Advising for Multi-Agent Reinforcement Learning [32.49380192781649]
Action advising is a knowledge transfer technique for reinforcement learning based on the teacher-student paradigm.
We introduce Explainable Action Advising, in which the teacher provides action advice as well as associated explanations indicating why the action was chosen.
This allows the student to self-reflect on what it has learned, enabling generalization advice and leading to improved sample efficiency and learning performance.
arXiv Detail & Related papers (2022-11-15T04:15:03Z) - Methodical Advice Collection and Reuse in Deep Reinforcement Learning [12.840744403432547]
This work considers how to better leverage uncertainties about when a student should ask for advice and if the student can model the teacher to ask for less advice.
Our empirical results show that using dual uncertainties to drive advice collection and reuse may improve learning performance across several Atari games.
arXiv Detail & Related papers (2022-04-14T22:24:55Z) - Distribution Matching for Machine Teaching [64.39292542263286]
Machine teaching is an inverse problem of machine learning that aims at steering the student learner towards its target hypothesis.
Previous studies on machine teaching focused on balancing the teaching risk and cost to find those best teaching examples.
This paper presents a distribution matching-based machine teaching strategy.
arXiv Detail & Related papers (2021-05-06T09:32:57Z) - Exploring Bayesian Deep Learning for Urgent Instructor Intervention Need
in MOOC Forums [58.221459787471254]
Massive Open Online Courses (MOOCs) have become a popular choice for e-learning thanks to their great flexibility.
Due to large numbers of learners and their diverse backgrounds, it is taxing to offer real-time support.
With the large volume of posts and high workloads for MOOC instructors, it is unlikely that the instructors can identify all learners requiring intervention.
This paper explores for the first time Bayesian deep learning on learner-based text posts with two methods: Monte Carlo Dropout and Variational Inference.
arXiv Detail & Related papers (2021-04-26T15:12:13Z) - Discovering an Aid Policy to Minimize Student Evasion Using Offline
Reinforcement Learning [2.2344764434954256]
We propose a decision support method to the selection of aid actions for students using offline reinforcement learning.
Our experiments using logged data of real students shows, through off-policy evaluation, that the method should achieve roughly 1.0 to 1.5 times as much cumulative reward as the logged policy.
arXiv Detail & Related papers (2021-04-20T21:45:19Z) - Action Advising with Advice Imitation in Deep Reinforcement Learning [0.5185131234265025]
Action advising is a peer-to-peer knowledge exchange technique built on the teacher-student paradigm.
We present an approach to enable the student agent to imitate previously acquired advice to reuse them directly in its exploration policy.
arXiv Detail & Related papers (2021-04-17T04:24:04Z) - Reducing the Teacher-Student Gap via Spherical Knowledge Disitllation [67.75526580926149]
Knowledge distillation aims at obtaining a compact and effective model by learning the mapping function from a much larger one.
We investigate the capacity gap problem by study the gap of confidence between teacher and student.
We find that the magnitude of confidence is not necessary for knowledge distillation and could harm the student performance if the student are forced to learn confidence.
arXiv Detail & Related papers (2020-10-15T03:03:36Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.