Dual Policy Distillation
- URL: http://arxiv.org/abs/2006.04061v1
- Date: Sun, 7 Jun 2020 06:49:47 GMT
- Title: Dual Policy Distillation
- Authors: Kwei-Herng Lai, Daochen Zha, Yuening Li, Xia Hu
- Abstract summary: Policy distillation, which transfers a teacher policy to a student policy, has achieved great success in challenging tasks of deep reinforcement learning.
In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment.
The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms.
- Score: 58.43610940026261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Policy distillation, which transfers a teacher policy to a student policy has
achieved great success in challenging tasks of deep reinforcement learning.
This teacher-student framework requires a well-trained teacher model which is
computationally expensive. Moreover, the performance of the student model could
be limited by the teacher model if the teacher model is not optimal. In the
light of collaborative learning, we study the feasibility of involving joint
intellectual efforts from diverse perspectives of student models. In this work,
we introduce dual policy distillation(DPD), a student-student framework in
which two learners operate on the same environment to explore different
perspectives of the environment and extract knowledge from each other to
enhance their learning. The key challenge in developing this dual learning
framework is to identify the beneficial knowledge from the peer learner for
contemporary learning-based reinforcement learning algorithms, since it is
unclear whether the knowledge distilled from an imperfect and noisy peer
learner would be helpful. To address the challenge, we theoretically justify
that distilling knowledge from a peer learner will lead to policy improvement
and propose a disadvantageous distillation strategy based on the theoretical
results. The conducted experiments on several continuous control tasks show
that the proposed framework achieves superior performance with a learning-based
agent and function approximation without the use of expensive teacher models.
Related papers
- Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - Automated Graph Self-supervised Learning via Multi-teacher Knowledge
Distillation [43.903582264697974]
This paper studies the problem of how to automatically, adaptively, and dynamically learn instance-level self-supervised learning strategies for each node.
We propose a novel multi-teacher knowledge distillation framework for Automated Graph Self-Supervised Learning (AGSSL)
Experiments on eight datasets show that AGSSL can benefit from multiple pretext tasks, outperforming the corresponding individual tasks.
arXiv Detail & Related papers (2022-10-05T08:39:13Z) - Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge
Distillation [70.92135839545314]
We propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation.
Our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.
arXiv Detail & Related papers (2022-06-13T11:52:13Z) - Generalized Knowledge Distillation via Relationship Matching [53.69235109551099]
Knowledge of a well-trained deep neural network (a.k.a. the "teacher") is valuable for learning similar tasks.
Knowledge distillation extracts knowledge from the teacher and integrates it with the target model.
Instead of enforcing the teacher to work on the same task as the student, we borrow the knowledge from a teacher trained from a general label space.
arXiv Detail & Related papers (2022-05-04T06:49:47Z) - Improved Knowledge Distillation via Adversarial Collaboration [2.373824287636486]
Small student model is trained to exploit the knowledge of a large well-trained teacher model.
Due to the capacity gap between the teacher and the student, the student's performance is hard to reach the level of the teacher.
We propose an Adversarial Collaborative Knowledge Distillation (ACKD) method that effectively improves the performance of knowledge distillation.
arXiv Detail & Related papers (2021-11-29T07:20:46Z) - Learning to Teach with Student Feedback [67.41261090761834]
Interactive Knowledge Distillation (IKD) allows the teacher to learn to teach from the feedback of the student.
IKD trains the teacher model to generate specific soft target at each training step for a certain student.
Joint optimization for both teacher and student is achieved by two iterative steps.
arXiv Detail & Related papers (2021-09-10T03:01:01Z) - Adaptive Multi-Teacher Multi-level Knowledge Distillation [11.722728148523366]
We propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework(AMTML-KD)
It consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights.
As such, a student model can learn multi-level knowledge from multiple teachers through AMTML-KD.
arXiv Detail & Related papers (2021-03-06T08:18:16Z) - Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student.
Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z) - Active Imitation Learning from Multiple Non-Deterministic Teachers:
Formulation, Challenges, and Algorithms [3.6702509833426613]
We formulate the problem of learning to imitate multiple, non-deterministic teachers with minimal interaction cost.
We first present a general framework that efficiently models and estimates such a distribution by learning continuous representations of the teacher policies.
Next, we develop Active Performance-Based Imitation Learning (APIL), an active learning algorithm for reducing the learner-teacher interaction cost.
arXiv Detail & Related papers (2020-06-14T03:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.