Continual Domain-Tuning for Pretrained Language Models
- URL: http://arxiv.org/abs/2004.02288v2
- Date: Fri, 19 Mar 2021 14:50:02 GMT
- Title: Continual Domain-Tuning for Pretrained Language Models
- Authors: Subendhu Rongali, Abhyuday Jagannatha, Bhanu Pratap Singh Rawat, and
Hong Yu
- Abstract summary: Simple domain tuning (SDT) has been widely used to create domain-tuned models such as BioBERT, SciBERT and ClinicalBERT.
During the pretraining phase on the target domain, the LM models may catastrophically forget the patterns learned from their source domain.
We propose continual learning (CL) based alternatives for SDT, that aim to reduce catastrophic forgetting.
- Score: 8.080145221992641
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models (LM) such as BERT, DistilBERT, and RoBERTa can be
tuned for different domains (domain-tuning) by continuing the pre-training
phase on a new target domain corpus. This simple domain tuning (SDT) technique
has been widely used to create domain-tuned models such as BioBERT, SciBERT and
ClinicalBERT. However, during the pretraining phase on the target domain, the
LM models may catastrophically forget the patterns learned from their source
domain. In this work, we study the effects of catastrophic forgetting on
domain-tuned LM models and investigate methods that mitigate its negative
effects. We propose continual learning (CL) based alternatives for SDT, that
aim to reduce catastrophic forgetting. We show that these methods may increase
the performance of LM models on downstream target domain tasks. Additionally,
we also show that constraining the LM model from forgetting the source domain
leads to downstream task models that are more robust to domain shifts. We
analyze the computational cost of using our proposed CL methods and provide
recommendations for computationally lightweight and effective CL domain-tuning
procedures.
Related papers
- Continual Domain Adaptation through Pruning-aided Domain-specific Weight
Modulation [37.3981662593942]
We develop a method to address unsupervised domain adaptation (UDA) in a practical setting of continual learning (CL)
The goal is to update the model on continually changing domains while preserving domain-specific knowledge to prevent catastrophic forgetting of past-seen domains.
arXiv Detail & Related papers (2023-04-15T13:44:58Z) - Decorate the Newcomers: Visual Domain Prompt for Continual Test Time
Adaptation [14.473807945791132]
Continual Test-Time Adaptation (CTTA) aims to adapt the source model to continually changing unlabeled target domains without access to the source data.
Motivated by the prompt learning in NLP, in this paper we propose to learn an image-level visual domain prompt for target domains while having the source model parameters frozen.
arXiv Detail & Related papers (2022-12-08T08:56:02Z) - Normalization Perturbation: A Simple Domain Generalization Method for
Real-World Domain Shifts [133.99270341855728]
Real-world domain styles can vary substantially due to environment changes and sensor noises.
Deep models only know the training domain style.
We propose Normalization Perturbation to overcome this domain style overfitting problem.
arXiv Detail & Related papers (2022-11-08T17:36:49Z) - Variational Model Perturbation for Source-Free Domain Adaptation [64.98560348412518]
We introduce perturbations into the model parameters by variational Bayesian inference in a probabilistic framework.
We demonstrate the theoretical connection to learning Bayesian neural networks, which proves the generalizability of the perturbed model to target domains.
arXiv Detail & Related papers (2022-10-19T08:41:19Z) - Neural Supervised Domain Adaptation by Augmenting Pre-trained Models
with Random Units [14.183224769428843]
Neural Transfer Learning (TL) is becoming ubiquitous in Natural Language Processing (NLP)
In this paper, we show through interpretation methods that such scheme, despite its efficiency, is suffering from a main limitation.
We propose to augment the pre-trained model with normalised, weighted and randomly initialised units that foster a better adaptation while maintaining the valuable source knowledge.
arXiv Detail & Related papers (2021-06-09T09:29:11Z) - Source-Free Open Compound Domain Adaptation in Semantic Segmentation [99.82890571842603]
In SF-OCDA, only the source pre-trained model and the target data are available to learn the target model.
We propose the Cross-Patch Style Swap (CPSS) to diversify samples with various patch styles in the feature-level.
Our method produces state-of-the-art results on the C-Driving dataset.
arXiv Detail & Related papers (2021-06-07T08:38:41Z) - UDALM: Unsupervised Domain Adaptation through Language Modeling [79.73916345178415]
We introduce UDALM, a fine-tuning procedure, using a mixed classification and Masked Language Model loss.
Our experiments show that performance of models trained with the mixed loss scales with the amount of available target data can be effectively used as a stopping criterion.
Our method is evaluated on twelve domain pairs of the Amazon Reviews Sentiment dataset, yielding $91.74%$ accuracy, which is an $1.11%$ absolute improvement over the state-of-versathe-art.
arXiv Detail & Related papers (2021-04-14T19:05:01Z) - Iterative Domain-Repaired Back-Translation [50.32925322697343]
In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent.
We propose a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair model to refine translations in synthetic bilingual data.
Experiments on adapting NMT models between specific domains and from the general domain to specific domains demonstrate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2020-10-06T04:38:09Z) - Neuron Linear Transformation: Modeling the Domain Shift for Crowd
Counting [34.560447389853614]
Cross-domain crowd counting (CDCC) is a hot topic due to its importance in public safety.
We propose a Neuron Linear Transformation (NLT) method, exploiting domain factor and bias weights to learn the domain shift.
Extensive experiments and analysis on six real-world datasets validate that NLT achieves top performance.
arXiv Detail & Related papers (2020-04-05T09:15:47Z) - A Simple Baseline to Semi-Supervised Domain Adaptation for Machine
Translation [73.3550140511458]
State-of-the-art neural machine translation (NMT) systems are data-hungry and perform poorly on new domains with no supervised data.
We propose a simple but effect approach to the semi-supervised domain adaptation scenario of NMT.
This approach iteratively trains a Transformer-based NMT model via three training objectives: language modeling, back-translation, and supervised translation.
arXiv Detail & Related papers (2020-01-22T16:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.