Teacher-Class Network: A Neural Network Compression Mechanism
- URL: http://arxiv.org/abs/2004.03281v3
- Date: Fri, 29 Oct 2021 18:58:25 GMT
- Title: Teacher-Class Network: A Neural Network Compression Mechanism
- Authors: Shaiq Munir Malik, Muhammad Umair Haider, Mohbat Tharani, Musab
Rasheed and Murtaza Taj
- Abstract summary: Instead of transferring knowledge to one student only, the proposed method transfers a chunk of knowledge to each student.
Our students are not trained for problem-specific logits, they are trained to mimic knowledge (dense representation) learned by the teacher network.
The proposed teacher-class architecture is evaluated on several benchmark datasets such as MNIST, Fashion MNIST, IMDB Movie Reviews, CAMVid, CIFAR-10 and ImageNet.
- Score: 2.257416403770908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To reduce the overwhelming size of Deep Neural Networks (DNN) teacher-student
methodology tries to transfer knowledge from a complex teacher network to a
simple student network. We instead propose a novel method called the
teacher-class network consisting of a single teacher and multiple student
networks (i.e. class of students). Instead of transferring knowledge to one
student only, the proposed method transfers a chunk of knowledge to each
student. Our students are not trained for problem-specific logits, they are
trained to mimic knowledge (dense representation) learned by the teacher
network thus the combined knowledge learned by the class of students can be
used to solve other problems as well. The proposed teacher-class architecture
is evaluated on several benchmark datasets such as MNIST, Fashion MNIST, IMDB
Movie Reviews, CAMVid, CIFAR-10 and ImageNet on multiple tasks including image
classification, sentiment classification and segmentation. Our approach
outperforms the state of-the-art single student approach in terms of accuracy
as well as computational cost while achieving 10-30 times reduction in
parameters.
Related papers
- UNIKD: UNcertainty-filtered Incremental Knowledge Distillation for Neural Implicit Representation [48.49860868061573]
Recent neural implicit representations (NIRs) have achieved great success in the tasks of 3D reconstruction and novel view synthesis.
They require the images of a scene from different camera views to be available for one-time training.
This is expensive especially for scenarios with large-scale scenes and limited data storage.
We design a student-teacher framework to mitigate the catastrophic problem.
arXiv Detail & Related papers (2022-12-21T11:43:20Z) - Collaborative Multi-Teacher Knowledge Distillation for Learning Low
Bit-width Deep Neural Networks [28.215073725175728]
We propose a novel framework that leverages both multi-teacher knowledge distillation and network quantization for learning low bit-width DNNs.
Our experimental results on CIFAR100 and ImageNet datasets show that the compact quantized student models trained with our method achieve competitive results.
arXiv Detail & Related papers (2022-10-27T01:03:39Z) - PrUE: Distilling Knowledge from Sparse Teacher Networks [4.087221125836262]
We present a pruning method termed Prediction Uncertainty Enlargement (PrUE) to simplify the teacher.
We empirically investigate the effectiveness of the proposed method with experiments on CIFAR-10/100, Tiny-ImageNet, and ImageNet.
Our method allows researchers to distill knowledge from deeper networks to improve students further.
arXiv Detail & Related papers (2022-07-03T08:14:24Z) - ORC: Network Group-based Knowledge Distillation using Online Role Change [3.735965959270874]
We propose an online role change strategy for multiple teacher-based knowledge distillations.
The top-ranked networks in the student group are able to promote to the teacher group at every iteration.
We verify the superiority of the proposed method on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2022-06-01T10:28:18Z) - Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance.
For the first time in knowledge distillation, cross-stage connection paths are proposed.
Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student.
Our method is very flexible that can be easily combined with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T06:52:20Z) - Exploring Knowledge Distillation of a Deep Neural Network for
Multi-Script identification [8.72467690936929]
Multi-lingual script identification is a difficult task consisting of different language with complex backgrounds in scene text images.
Deep neural networks are employed as teacher models to train a smaller student network by utilizing the teacher model's predictions.
arXiv Detail & Related papers (2021-02-20T12:54:07Z) - Densely Guided Knowledge Distillation using Multiple Teacher Assistants [5.169724825219126]
We propose a densely guided knowledge distillation using multiple teacher assistants that gradually decreases the model size.
We also design teaching where, for each mini-batch, a teacher or teacher assistants are randomly dropped.
This acts as a regularizer to improve the efficiency of teaching of the student network.
arXiv Detail & Related papers (2020-09-18T13:12:52Z) - Point Adversarial Self Mining: A Simple Method for Facial Expression
Recognition [79.75964372862279]
We propose Point Adversarial Self Mining (PASM) to improve the recognition accuracy in facial expression recognition.
PASM uses a point adversarial attack method and a trained teacher network to locate the most informative position related to the target task.
The adaptive learning materials generation and teacher/student update can be conducted more than one time, improving the network capability iteratively.
arXiv Detail & Related papers (2020-08-26T06:39:24Z) - Efficient Crowd Counting via Structured Knowledge Transfer [122.30417437707759]
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications.
We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network.
Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-03-23T08:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.