Real-time Policy Distillation in Deep Reinforcement Learning
- URL: http://arxiv.org/abs/1912.12630v1
- Date: Sun, 29 Dec 2019 11:10:37 GMT
- Title: Real-time Policy Distillation in Deep Reinforcement Learning
- Authors: Yuxiang Sun and Pooyan Fazli
- Abstract summary: Policy distillation is an effective way to transfer control policies from a larger network to a smaller untrained network.
Existing approaches are computationally inefficient, resulting in a long distillation time.
We propose a new distillation mechanism, called real-time policy distillation, in which training the teacher model and distilling the policy to the student model occur simultaneously.
- Score: 11.026828277064293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Policy distillation in deep reinforcement learning provides an effective way
to transfer control policies from a larger network to a smaller untrained
network without a significant degradation in performance. However, policy
distillation is underexplored in deep reinforcement learning, and existing
approaches are computationally inefficient, resulting in a long distillation
time. In addition, the effectiveness of the distillation process is still
limited to the model capacity. We propose a new distillation mechanism, called
real-time policy distillation, in which training the teacher model and
distilling the policy to the student model occur simultaneously. Accordingly,
the teacher's latest policy is transferred to the student model in real time.
This reduces the distillation time to half the original time or even less and
also makes it possible for extremely small student models to learn skills at
the expert level. We evaluated the proposed algorithm in the Atari 2600 domain.
The results show that our approach can achieve full distillation in most games,
even with compression ratios up to 1.7%.
Related papers
- FastWhisper: Adaptive Self-knowledge Distillation for Real-time Automatic Speech Recognition [3.489980912925397]
We propose adaptive self-knowledge distillation, which reduces the dependence of the teacher model to improve the self-training capacity.<n>FastWhisper achieves a word error rate of 1.07% lower than the teacher model Whisper, and its relative inference time was 5 times faster.
arXiv Detail & Related papers (2026-01-08T08:05:30Z) - Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only.
Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z) - Efficient Knowledge Distillation via Curriculum Extraction [9.320038077848709]
We show that a curriculum can be emphextracted from just the fully trained teacher network, and that this extracted curriculum can give similar efficiency benefits to those of progressive distillation.
Our scheme significantly outperforms one-shot distillation and achieves a performance similar to that of progressive distillation for learning sparse parities with two-layer networks.
arXiv Detail & Related papers (2025-03-21T19:09:41Z) - Towards Training One-Step Diffusion Models Without Distillation [72.80423908458772]
We show that one-step generative models can be trained directly without this distillation process.
We propose a family of distillation methods that achieve competitive results without relying on score estimation.
arXiv Detail & Related papers (2025-02-11T23:02:14Z) - Knowledge Distillation with Refined Logits [31.205248790623703]
We introduce Refined Logit Distillation (RLD) to address the limitations of current logit distillation methods.<n>Our approach is motivated by the observation that even high-performing teacher models can make incorrect predictions.<n>Our method can effectively eliminate misleading information from the teacher while preserving crucial class correlations.
arXiv Detail & Related papers (2024-08-14T17:59:32Z) - Proximal Policy Distillation [3.2634122554914002]
We introduce Proximal Policy Distillation (PPD), a novel policy distillation method that integrates student-driven distillation and Proximal Policy Optimization (PPO)
We compare PPD with two common alternatives, student-distill and teacher-distill, over a wide range of reinforcement learning environments.
Our findings indicate that PPD improves sample efficiency and produces better student policies compared to typical policy distillation approaches.
arXiv Detail & Related papers (2024-07-21T12:08:54Z) - AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting [5.818420448447701]
We propose Adaptive Knowledge Distillation, a novel technique inspired by curriculum learning to adaptively weigh the losses at instance level.
Our method follows a plug-and-play paradigm that can be applied on top of any task-specific and distillation objectives.
arXiv Detail & Related papers (2024-05-11T15:06:24Z) - Education distillation:getting student models to learn in shcools [15.473668050280304]
This paper introduces dynamic incremental learning into knowledge distillation.
It is proposed to take fragmented student models divided from the complete student model as lower-grade models.
arXiv Detail & Related papers (2023-11-23T05:20:18Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Online Distillation with Continual Learning for Cyclic Domain Shifts [52.707212371912476]
We propose a solution by leveraging the power of continual learning methods to reduce the impact of domain shifts.
Our work represents an important step forward in the field of online distillation and continual learning, with the potential to significantly impact real-world applications.
arXiv Detail & Related papers (2023-04-03T11:15:05Z) - Towards a Smaller Student: Capacity Dynamic Distillation for Efficient
Image Retrieval [49.01637233471453]
Previous Knowledge Distillation based efficient image retrieval methods employs a lightweight network as the student model for fast inference.
We propose a Capacity Dynamic Distillation framework, which constructs a student model with editable representation capacity.
Our method has superior inference speed and accuracy, e.g., on the VeRi-776 dataset, given the ResNet101 as a teacher.
arXiv Detail & Related papers (2023-03-16T11:09:22Z) - HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
Transformers [49.79405257763856]
This paper focuses on task-agnostic distillation.
It produces a compact pre-trained model that can be easily fine-tuned on various tasks with small computational costs and memory footprints.
We propose Homotopic Distillation (HomoDistil), a novel task-agnostic distillation approach equipped with iterative pruning.
arXiv Detail & Related papers (2023-02-19T17:37:24Z) - Unbiased Knowledge Distillation for Recommendation [66.82575287129728]
Knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency.
Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge to supervise the learning of a compact student model.
We find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation.
arXiv Detail & Related papers (2022-11-27T05:14:03Z) - Dynamic Rectification Knowledge Distillation [0.0]
Dynamic Rectification Knowledge Distillation (DR-KD) is a knowledge distillation framework.
DR-KD transforms the student into its own teacher, and if the self-teacher makes wrong predictions while distilling information, the error is rectified prior to the knowledge being distilled.
Our proposed DR-KD performs remarkably well in the absence of a sophisticated cumbersome teacher model.
arXiv Detail & Related papers (2022-01-27T04:38:01Z) - Autoregressive Knowledge Distillation through Imitation Learning [70.12862707908769]
We develop a compression technique for autoregressive models driven by an imitation learning perspective on knowledge distillation.
Our method consistently outperforms other distillation algorithms, such as sequence-level knowledge distillation.
Student models trained with our method attain 1.4 to 4.8 BLEU/ROUGE points higher than those trained from scratch, while increasing inference speed by up to 14 times in comparison to the teacher model.
arXiv Detail & Related papers (2020-09-15T17:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.