Feature Adaptation of Pre-Trained Language Models across Languages and
Domains with Robust Self-Training
- URL: http://arxiv.org/abs/2009.11538v3
- Date: Mon, 30 Nov 2020 05:50:30 GMT
- Title: Feature Adaptation of Pre-Trained Language Models across Languages and
Domains with Robust Self-Training
- Authors: Hai Ye, Qingyu Tan, Ruidan He, Juntao Li, Hwee Tou Ng, Lidong Bing
- Abstract summary: We adapt pre-trained language models (PrLMs) to new domains without fine-tuning.
We present class-aware feature self-distillation (CFd) to learn discriminative features from PrLMs.
Experiments on two monolingual and multilingual Amazon review datasets show that CFd can consistently improve the performance of self-training.
- Score: 47.12438995938133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adapting pre-trained language models (PrLMs) (e.g., BERT) to new domains has
gained much attention recently. Instead of fine-tuning PrLMs as done in most
previous work, we investigate how to adapt the features of PrLMs to new domains
without fine-tuning. We explore unsupervised domain adaptation (UDA) in this
paper. With the features from PrLMs, we adapt the models trained with labeled
data from the source domain to the unlabeled target domain. Self-training is
widely used for UDA which predicts pseudo labels on the target domain data for
training. However, the predicted pseudo labels inevitably include noise, which
will negatively affect training a robust model. To improve the robustness of
self-training, in this paper we present class-aware feature self-distillation
(CFd) to learn discriminative features from PrLMs, in which PrLM features are
self-distilled into a feature adaptation module and the features from the same
class are more tightly clustered. We further extend CFd to a cross-language
setting, in which language discrepancy is studied. Experiments on two
monolingual and multilingual Amazon review datasets show that CFd can
consistently improve the performance of self-training in cross-domain and
cross-language settings.
Related papers
- FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous
Self-Supervised Learning [54.9235160379917]
FusDom is a simple and novel methodology for SSL-based continued pre-training.
FusDom learns speech representations that are robust and adaptive yet not forgetful of concepts seen in the past.
arXiv Detail & Related papers (2023-12-20T13:50:05Z) - Evolving Domain Adaptation of Pretrained Language Models for Text
Classification [24.795214770636534]
Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection.
This study benchmarks the effectiveness of evolving domain adaptation (EDA) strategies, notably self-training, domain-adversarial training, and domain-adaptive pretraining, with a focus on an incremental self-training method.
arXiv Detail & Related papers (2023-11-16T08:28:00Z) - Improving Language Plasticity via Pretraining with Active Forgetting [63.36484652568976]
We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages.
Experiments with RoBERTa show that models pretrained with our forgetting mechanism demonstrate faster convergence during language adaptation.
arXiv Detail & Related papers (2023-07-03T17:12:44Z) - VarMAE: Pre-training of Variational Masked Autoencoder for
Domain-adaptive Language Understanding [5.1282202633907]
We propose a novel Transformer-based language model named VarMAE for domain-adaptive language understanding.
Under the masked autoencoding objective, we design a context uncertainty learning module to encode the token's context into a smooth latent distribution.
Experiments on science- and finance-domain NLU tasks demonstrate that VarMAE can be efficiently adapted to new domains with limited resources.
arXiv Detail & Related papers (2022-11-01T12:51:51Z) - QAGAN: Adversarial Approach To Learning Domain Invariant Language
Features [0.76146285961466]
We explore adversarial training approach towards learning domain-invariant features.
We are able to achieve $15.2%$ improvement in EM score and $5.6%$ boost in F1 score on out-of-domain validation dataset.
arXiv Detail & Related papers (2022-06-24T17:42:18Z) - Domain Adaptation via Prompt Learning [39.97105851723885]
Unsupervised domain adaption (UDA) aims to adapt models learned from a well-annotated source domain to a target domain.
We introduce a novel prompt learning paradigm for UDA, named Domain Adaptation via Prompt Learning (DAPL)
arXiv Detail & Related papers (2022-02-14T13:25:46Z) - Efficient Domain Adaptation of Language Models via Adaptive Tokenization [5.058301279065432]
We show that domain-specific subword sequences can be efficiently determined directly from divergences in the conditional token distributions of the base and domain-specific corpora.
Our approach produces smaller models and less training and inference time than other approaches using tokenizer augmentation.
arXiv Detail & Related papers (2021-09-15T17:51:27Z) - Non-Parametric Unsupervised Domain Adaptation for Neural Machine
Translation [61.27321597981737]
$k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor retrieval.
We propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval.
arXiv Detail & Related papers (2021-09-14T11:50:01Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.