Related papers: Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains

URL: http://arxiv.org/abs/2106.13474v2
Date: Tue, 29 Jun 2021 05:42:13 GMT
Title: Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains
Authors: Yunzhi Yao, Shaohan Huang, Wenhui Wang, Li Dong, Furu Wei
Abstract summary: We present a general approach to developing small, fast and effective pre-trained models for specific domains. This is achieved by adapting the off-the-shelf general pre-trained models and performing task-agnostic knowledge distillation in target domains.
Score: 45.07506437436464
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large pre-trained models have achieved great success in many natural language processing tasks. However, when they are applied in specific domains, these models suffer from domain shift and bring challenges in fine-tuning and online serving for latency and capacity constraints. In this paper, we present a general approach to developing small, fast and effective pre-trained models for specific domains. This is achieved by adapting the off-the-shelf general pre-trained models and performing task-agnostic knowledge distillation in target domains. Specifically, we propose domain-specific vocabulary expansion in the adaptation stage and employ corpus level occurrence probability to choose the size of incremental vocabulary automatically. Then we systematically explore different strategies to compress the large pre-trained models for specific domains. We conduct our experiments in the biomedical and computer science domain. The experimental results demonstrate that our approach achieves better performance over the BERT BASE model in domain-specific tasks while 3.3x smaller and 5.1x faster than BERT BASE. The code and pre-trained models are available at https://aka.ms/adalm.

Related papers

Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification [71.08024880298613]
We study the multi-source Domain Generalization of text classification. We propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain.
arXiv Detail & Related papers (2024-09-20T07:46:21Z)
QAGAN: Adversarial Approach To Learning Domain Invariant Language Features [0.76146285961466]
We explore adversarial training approach towards learning domain-invariant features. We are able to achieve $15.2%$ improvement in EM score and $5.6%$ boost in F1 score on out-of-domain validation dataset.
arXiv Detail & Related papers (2022-06-24T17:42:18Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
Domain Generalization using Pretrained Models without Fine-tuning [25.489714555859944]
Fine-tuning pretrained models is a common practice in domain generalization (DG) tasks. We propose a novel domain generalization paradigm to better leverage various pretrained models, named specialized ensemble learning for domain generalization (SEDGE) SEDGE achieves significant performance improvements comparing to strong baselines including state-of-the-art method in DG tasks.
arXiv Detail & Related papers (2022-03-09T09:33:59Z)
bert2BERT: Towards Reusable Pretrained Language Models [51.078081486422896]
We propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model. bert2BERT saves about 45% and 47% computational cost of pre-training BERT_BASE and GPT_BASE by reusing the models of almost their half sizes.
arXiv Detail & Related papers (2021-10-14T04:05:25Z)
Efficient Domain Adaptation of Language Models via Adaptive Tokenization [5.058301279065432]
We show that domain-specific subword sequences can be efficiently determined directly from divergences in the conditional token distributions of the base and domain-specific corpora. Our approach produces smaller models and less training and inference time than other approaches using tokenizer augmentation.
arXiv Detail & Related papers (2021-09-15T17:51:27Z)
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains. Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z)
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks. A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains. Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z)
CALM: Continuous Adaptive Learning for Language Modeling [18.72860206714457]
Training large language representation models has become a standard in the natural language processing community. We demonstrate that in practice these pre-trained models present performance deterioration in the form of catastrophic forgetting. We propose CALM, Continuous Adaptive Learning for Language Modeling: techniques to render models which retain knowledge across multiple domains.
arXiv Detail & Related papers (2020-04-08T03:51:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.