Distilling Linguistic Context for Language Model Compression
- URL: http://arxiv.org/abs/2109.08359v1
- Date: Fri, 17 Sep 2021 05:51:45 GMT
- Title: Distilling Linguistic Context for Language Model Compression
- Authors: Geondo Park, Gyeongman Kim, Eunho Yang
- Abstract summary: A computationally expensive and memory intensive neural network lies behind the recent success of language representation learning.
We present a new knowledge distillation objective for language representation learning that transfers the contextual knowledge via two types of relationships.
We validate the effectiveness of our method on challenging benchmarks of language understanding tasks.
- Score: 27.538080564616703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A computationally expensive and memory intensive neural network lies behind
the recent success of language representation learning. Knowledge distillation,
a major technique for deploying such a vast language model in resource-scarce
environments, transfers the knowledge on individual word representations
learned without restrictions. In this paper, inspired by the recent
observations that language representations are relatively positioned and have
more semantic knowledge as a whole, we present a new knowledge distillation
objective for language representation learning that transfers the contextual
knowledge via two types of relationships across representations: Word Relation
and Layer Transforming Relation. Unlike other recent distillation techniques
for the language models, our contextual distillation does not have any
restrictions on architectural changes between teacher and student. We validate
the effectiveness of our method on challenging benchmarks of language
understanding tasks, not only in architectures of various sizes, but also in
combination with DynaBERT, the recently proposed adaptive size pruning method.
Related papers
- Enhancing Context Through Contrast [0.4068270792140993]
We propose a novel Context Enhancement step to improve performance on neural machine translation.
Unlike other approaches, we do not explicitly augment the data but view languages as implicit augmentations.
Our method does not learn embeddings from scratch and can be generalised to any set of pre-trained embeddings.
arXiv Detail & Related papers (2024-01-06T22:13:51Z) - Adaptive Knowledge Distillation between Text and Speech Pre-trained
Models [30.125690848883455]
Prior-informed Adaptive knowledge Distillation (PAD) is more effective in transferring linguistic knowledge than other metric-based distillation approaches.
This paper studies metric-based distillation to align the embedding space of text and speech with only a small amount of data.
We evaluate on three spoken language understanding benchmarks to show that PAD is more effective in transferring linguistic knowledge than other metric-based distillation approaches.
arXiv Detail & Related papers (2023-03-07T02:31:57Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Transfer Learning of Lexical Semantic Families for Argumentative
Discourse Units Identification [0.8508198765617198]
Argument mining tasks require an informed range of low to high complexity linguistic phenomena and commonsense knowledge.
Previous work has shown that pre-trained language models are highly effective at encoding syntactic and semantic linguistic phenomena.
It remains an issue of how much the existing pre-trained language models encompass the complexity of argument mining tasks.
arXiv Detail & Related papers (2022-09-06T13:38:47Z) - Knowledge Graph Fusion for Language Model Fine-tuning [0.0]
We investigate the benefits of knowledge incorporation into the fine-tuning stages of BERT.
An existing K-BERT model, which enriches sentences with triplets from a Knowledge Graph, is adapted for the English language.
Changes made to K-BERT for accommodating the English language also extend to other word-based languages.
arXiv Detail & Related papers (2022-06-21T08:06:22Z) - Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning
for Low-Resource Speech Recognition [159.9312272042253]
Wav-BERT is a cooperative acoustic and linguistic representation learning method.
We unify a pre-trained acoustic model (wav2vec 2.0) and a language model (BERT) into an end-to-end trainable framework.
arXiv Detail & Related papers (2021-09-19T16:39:22Z) - Neural Abstructions: Abstractions that Support Construction for Grounded
Language Learning [69.1137074774244]
Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding.
We introduce the idea of neural abstructions: a set of constraints on the inference procedure of a label-conditioned generative model.
We show that with this method a user population is able to build a semantic modification for an open-ended house task in Minecraft.
arXiv Detail & Related papers (2021-07-20T07:01:15Z) - VidLanKD: Improving Language Understanding via Video-Distilled Knowledge
Transfer [76.3906723777229]
We present VidLanKD, a video-language knowledge distillation method for improving language understanding.
We train a multi-modal teacher model on a video-text dataset, and then transfer its knowledge to a student language model with a text dataset.
In our experiments, VidLanKD achieves consistent improvements over text-only language models and vokenization models.
arXiv Detail & Related papers (2021-07-06T15:41:32Z) - Neuro-Symbolic Representations for Video Captioning: A Case for
Leveraging Inductive Biases for Vision and Language [148.0843278195794]
We propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning.
Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions.
arXiv Detail & Related papers (2020-11-18T20:21:19Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - A Common Semantic Space for Monolingual and Cross-Lingual
Meta-Embeddings [10.871587311621974]
This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings.
Existing word vectors are projected to a common semantic space using linear transformations and averaging.
The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities.
arXiv Detail & Related papers (2020-01-17T15:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.