Multi-Modality Distillation via Learning the teacher's modality-level
Gram Matrix
- URL: http://arxiv.org/abs/2112.11447v1
- Date: Tue, 21 Dec 2021 18:53:58 GMT
- Title: Multi-Modality Distillation via Learning the teacher's modality-level
Gram Matrix
- Authors: Peng Liu
- Abstract summary: It is necessary to force the student network to learn the modality relationship information of the teacher network.
To effectively exploit transfering knowledge from teachers to students, a novel modality relation distillation paradigm by modeling the relationship information among different modality are adopted.
- Score: 3.4793807018498555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the context of multi-modality knowledge distillation research, the
existing methods was mainly focus on the problem of only learning teacher final
output. Thus, there are still deep differences between the teacher network and
the student network. It is necessary to force the student network to learn the
modality relationship information of the teacher network. To effectively
exploit transfering knowledge from teachers to students, a novel modality
relation distillation paradigm by modeling the relationship information among
different modality are adopted, that is learning the teacher modality-level
Gram Matrix.
Related papers
- Enhanced Multimodal Representation Learning with Cross-modal KD [14.14709952127258]
This paper explores leveraging auxiliary modalities which are only available at training to enhance multimodal representation learning through cross-modal Knowledge Distillation (KD)
The widely adopted mutual information-based objective leads to a short-cut solution of the weak teacher, i.e., achieving the maximum mutual information by simply making the teacher model as weak as the student model.
To prevent such a weak solution, we introduce an additional objective term, i.e., the mutual information between the teacher and the auxiliary modality model.
arXiv Detail & Related papers (2023-06-13T09:35:37Z) - Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning [16.293262022872412]
We propose Adaptive Multi-teacher Knowledge Distillation with Meta-Learning (MMKD) to supervise student with appropriate knowledge from a tailored ensemble teacher.
With the help of a meta-weight network, the diverse yet compatible teacher knowledge in the output layer and intermediate layers is jointly leveraged to enhance the student performance.
arXiv Detail & Related papers (2023-06-11T09:38:45Z) - Collaborative Multi-Teacher Knowledge Distillation for Learning Low
Bit-width Deep Neural Networks [28.215073725175728]
We propose a novel framework that leverages both multi-teacher knowledge distillation and network quantization for learning low bit-width DNNs.
Our experimental results on CIFAR100 and ImageNet datasets show that the compact quantized student models trained with our method achieve competitive results.
arXiv Detail & Related papers (2022-10-27T01:03:39Z) - Learning Knowledge Representation with Meta Knowledge Distillation for
Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task.
Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z) - Generalized Knowledge Distillation via Relationship Matching [53.69235109551099]
Knowledge of a well-trained deep neural network (a.k.a. the "teacher") is valuable for learning similar tasks.
Knowledge distillation extracts knowledge from the teacher and integrates it with the target model.
Instead of enforcing the teacher to work on the same task as the student, we borrow the knowledge from a teacher trained from a general label space.
arXiv Detail & Related papers (2022-05-04T06:49:47Z) - Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency.
We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z) - Fixing the Teacher-Student Knowledge Discrepancy in Distillation [72.4354883997316]
We propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student.
Our method is very flexible that can be easily combined with other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-31T06:52:20Z) - Student Network Learning via Evolutionary Knowledge Distillation [22.030934154498205]
We propose an evolutionary knowledge distillation approach to improve the transfer effectiveness of teacher knowledge.
Instead of a fixed pre-trained teacher, an evolutionary teacher is learned online and consistently transfers intermediate knowledge to supervise student network learning on-the-fly.
In this way, the student can simultaneously obtain rich internal knowledge and capture its growth process, leading to effective student network learning.
arXiv Detail & Related papers (2021-03-23T02:07:15Z) - Interactive Knowledge Distillation [79.12866404907506]
We propose an InterActive Knowledge Distillation scheme to leverage the interactive teaching strategy for efficient knowledge distillation.
In the distillation process, the interaction between teacher and student networks is implemented by a swapping-in operation.
Experiments with typical settings of teacher-student networks demonstrate that the student networks trained by our IAKD achieve better performance than those trained by conventional knowledge distillation methods.
arXiv Detail & Related papers (2020-07-03T03:22:04Z) - Neural Multi-Task Learning for Teacher Question Detection in Online
Classrooms [50.19997675066203]
We build an end-to-end neural framework that automatically detects questions from teachers' audio recordings.
By incorporating multi-task learning techniques, we are able to strengthen the understanding of semantic relations among different types of questions.
arXiv Detail & Related papers (2020-05-16T02:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.