HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain
Language Model Compression
- URL: http://arxiv.org/abs/2110.08551v1
- Date: Sat, 16 Oct 2021 11:23:02 GMT
- Title: HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain
Language Model Compression
- Authors: Chenhe Dong, Yaliang Li, Ying Shen, Minghui Qiu
- Abstract summary: Large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods.
We propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information.
- Score: 53.90578309960526
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: On many natural language processing tasks, large pre-trained language models
(PLMs) have shown overwhelming performances compared with traditional neural
network methods. Nevertheless, their huge model size and low inference speed
have hindered the deployment on resource-limited devices in practice. In this
paper, we target to compress PLMs with knowledge distillation, and propose a
hierarchical relational knowledge distillation (HRKD) method to capture both
hierarchical and domain relational information. Specifically, to enhance the
model capability and transferability, we leverage the idea of meta-learning and
set up domain-relational graphs to capture the relational information across
different domains. And to dynamically select the most representative prototypes
for each domain, we propose a hierarchical compare-aggregate mechanism to
capture hierarchical relationships. Extensive experiments on public
multi-domain datasets demonstrate the superior performance of our HRKD method
as well as its strong few-shot learning ability. For reproducibility, we
release the code at https://github.com/cheneydon/hrkd.
Related papers
- Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification [71.08024880298613]
We study the multi-source Domain Generalization of text classification.
We propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain.
arXiv Detail & Related papers (2024-09-20T07:46:21Z) - StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization [85.18995948334592]
Single domain generalization (single DG) aims at learning a robust model generalizable to unseen domains from only one training domain.
State-of-the-art approaches have mostly relied on data augmentations, such as adversarial perturbation and style enhancement, to synthesize new data.
We propose emphStyDeSty, which explicitly accounts for the alignment of the source and pseudo domains in the process of data augmentation.
arXiv Detail & Related papers (2024-06-01T02:41:34Z) - Learning Hierarchical Features with Joint Latent Space Energy-Based
Prior [44.4434704520236]
We study the fundamental problem of multi-layer generator models in learning hierarchical representations.
We propose a joint latent space EBM prior model with multi-layer latent variables for effective hierarchical representation learning.
arXiv Detail & Related papers (2023-10-14T15:44:14Z) - Reinforcement Learning Based Multi-modal Feature Fusion Network for
Novel Class Discovery [47.28191501836041]
In this paper, we employ a Reinforcement Learning framework to simulate the cognitive processes of humans.
We also deploy a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information.
We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets.
arXiv Detail & Related papers (2023-08-26T07:55:32Z) - Distilling Universal and Joint Knowledge for Cross-Domain Model
Compression on Time Series Data [18.41222232863567]
We propose a novel end-to-end framework called Universal and joint knowledge distillation (UNI-KD) for cross-domain model compression.
In particular, we propose to transfer both the universal feature-level knowledge across source and target domains and the joint logit-level knowledge shared by both domains from the teacher to the student model via an adversarial learning scheme.
arXiv Detail & Related papers (2023-07-07T01:48:02Z) - Recurrent Neural Networks with Mixed Hierarchical Structures and EM
Algorithm for Natural Language Processing [9.645196221785694]
We develop an approach called the latent indicator layer to identify and learn implicit hierarchical information.
We also develop an EM algorithm to handle the latent indicator layer in training.
We show that the EM-HRNN model with bootstrap training outperforms other RNN-based models in document classification tasks.
arXiv Detail & Related papers (2022-01-21T23:08:33Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Learning to Combine: Knowledge Aggregation for Multi-Source Domain
Adaptation [56.694330303488435]
We propose a Learning to Combine for Multi-Source Domain Adaptation (LtC-MSDA) framework.
In the nutshell, a knowledge graph is constructed on the prototypes of various domains to realize the information propagation among semantically adjacent representations.
Our approach outperforms existing methods with a remarkable margin.
arXiv Detail & Related papers (2020-07-17T07:52:44Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.