Heterogeneous-Branch Collaborative Learning for Dialogue Generation
- URL: http://arxiv.org/abs/2303.11621v1
- Date: Tue, 21 Mar 2023 06:41:50 GMT
- Title: Heterogeneous-Branch Collaborative Learning for Dialogue Generation
- Authors: Yiwei Li, Shaoxiong Feng, Bin Sun, Kan Li
- Abstract summary: Collaborative learning is an effective way to conduct one-stage group distillation in the absence of a well-trained large teacher model.
Previous work has a severe branch homogeneity problem due to the same training objective and independent identical training sets.
We propose a dual group-based knowledge distillation method, consisting of positive distillation and negative distillation, to further diversify the features of different branches in a steadily and interpretable way.
- Score: 11.124375734351826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the development of deep learning, advanced dialogue generation methods
usually require a greater amount of computational resources. One promising
approach to obtaining a high-performance and lightweight model is knowledge
distillation, which relies heavily on the pre-trained powerful teacher.
Collaborative learning, also known as online knowledge distillation, is an
effective way to conduct one-stage group distillation in the absence of a
well-trained large teacher model. However, previous work has a severe branch
homogeneity problem due to the same training objective and the independent
identical training sets. To alleviate this problem, we consider the dialogue
attributes in the training of network branches. Each branch learns the
attribute-related features based on the selected subset. Furthermore, we
propose a dual group-based knowledge distillation method, consisting of
positive distillation and negative distillation, to further diversify the
features of different branches in a steadily and interpretable way. The
proposed approach significantly improves branch heterogeneity and outperforms
state-of-the-art collaborative learning methods on two widely used open-domain
dialogue datasets.
Related papers
- Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching [53.05954114863596]
We propose a brand-new Deep Boosting Learning (DBL) algorithm for image-text matching.
An anchor branch is first trained to provide insights into the data properties.
A target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples.
arXiv Detail & Related papers (2024-04-28T08:44:28Z) - Decoupled Knowledge with Ensemble Learning for Online Distillation [3.794605440322862]
Online knowledge distillation is a one-stage strategy that alleviates the requirement with mutual learning and collaborative learning.
Recent peer collaborative learning (PCL) integrates online ensemble, collaboration of base networks and temporal mean teacher to construct effective knowledge.
A decoupled knowledge for online knowledge distillation is generated by an independent teacher, separate from the student.
arXiv Detail & Related papers (2023-12-18T14:08:59Z) - I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal
Mutual Distillation [147.2183428328396]
We introduce a general Inter- and Intra-modal Mutual Distillation (I$2$MD) framework.
In I$2$MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process.
To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy.
arXiv Detail & Related papers (2023-10-24T07:22:17Z) - Channel Self-Supervision for Online Knowledge Distillation [14.033675223173933]
We propose a novel online knowledge distillation method, textbfChannel textbfSelf-textbfSupervision for Online Knowledge Distillation (CSS)
We construct a dual-network multi-branch structure and enhance inter-branch diversity through self-supervised learning.
Our method provides greater diversity than OKDDip and we also give pretty performance improvement, even over the state-of-the-art such as PCL.
arXiv Detail & Related papers (2022-03-22T12:35:20Z) - Weakly Supervised Semantic Segmentation via Alternative Self-Dual
Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model.
We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z) - Distilling Knowledge via Intermediate Classifier Heads [0.5584060970507505]
Knowledge distillation is a transfer-learning approach to train a resource-limited student model with the guide of a pre-trained larger teacher model.
We introduce knowledge distillation via intermediate heads to mitigate the impact of the capacity gap.
Our experiments on various teacher-student pairs and datasets have demonstrated that the proposed approach outperforms the canonical knowledge distillation approach.
arXiv Detail & Related papers (2021-02-28T12:52:52Z) - Peer Collaborative Learning for Online Knowledge Distillation [69.29602103582782]
Peer Collaborative Learning method integrates online ensembling and network collaboration into a unified framework.
Experiments on CIFAR-10, CIFAR-100 and ImageNet show that the proposed method significantly improves the generalisation of various backbone networks.
arXiv Detail & Related papers (2020-06-07T13:21:52Z) - Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model
Distillation Approach [55.83558520598304]
We propose a brand new solution to reuse experiences and transfer value functions among multiple students via model distillation.
We also describe how to design an efficient communication protocol to exploit heterogeneous knowledge.
Our proposed framework, namely Learning and Teaching Categorical Reinforcement, shows promising performance on stabilizing and accelerating learning progress.
arXiv Detail & Related papers (2020-02-06T11:31:04Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.