Related papers: Continual Domain-Tuning for Pretrained Language Models

Continual Domain-Tuning for Pretrained Language Models

URL: http://arxiv.org/abs/2004.02288v2
Date: Fri, 19 Mar 2021 14:50:02 GMT
Title: Continual Domain-Tuning for Pretrained Language Models
Authors: Subendhu Rongali, Abhyuday Jagannatha, Bhanu Pratap Singh Rawat, and Hong Yu
Abstract summary: Simple domain tuning (SDT) has been widely used to create domain-tuned models such as BioBERT, SciBERT and ClinicalBERT. During the pretraining phase on the target domain, the LM models may catastrophically forget the patterns learned from their source domain. We propose continual learning (CL) based alternatives for SDT, that aim to reduce catastrophic forgetting.
Score: 8.080145221992641
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-trained language models (LM) such as BERT, DistilBERT, and RoBERTa can be tuned for different domains (domain-tuning) by continuing the pre-training phase on a new target domain corpus. This simple domain tuning (SDT) technique has been widely used to create domain-tuned models such as BioBERT, SciBERT and ClinicalBERT. However, during the pretraining phase on the target domain, the LM models may catastrophically forget the patterns learned from their source domain. In this work, we study the effects of catastrophic forgetting on domain-tuned LM models and investigate methods that mitigate its negative effects. We propose continual learning (CL) based alternatives for SDT, that aim to reduce catastrophic forgetting. We show that these methods may increase the performance of LM models on downstream target domain tasks. Additionally, we also show that constraining the LM model from forgetting the source domain leads to downstream task models that are more robust to domain shifts. We analyze the computational cost of using our proposed CL methods and provide recommendations for computationally lightweight and effective CL domain-tuning procedures.

Related papers

Adapting In-Domain Few-Shot Segmentation to New Domains without Retraining [53.963279865355105]
Cross-domain few-shot segmentation (CD-FSS) aims to segment objects of novel classes in new domains. Most CD-FSS methods redesign and retrain in-domain FSS models using various domain-generalization techniques. We propose adapting informative model structures of the well-trained FSS model for target domains by learning domain characteristics from few-shot labeled support samples.
arXiv Detail & Related papers (2025-04-30T08:16:33Z)
GDO: Gradual Domain Osmosis [1.62060928868899]
We propose a new method called Gradual Domain Osmosis, which aims to solve the problem of smooth knowledge migration from source domain to target domain in Gradual Domain Adaptation (GDA) Traditional Gradual Domain Adaptation methods mitigate domain bias by introducing intermediate domains and self-training strategies, but often face the challenges of inefficient knowledge migration or missing data in intermediate domains.
arXiv Detail & Related papers (2025-01-31T14:25:45Z)
Continual Domain Adaptation through Pruning-aided Domain-specific Weight Modulation [37.3981662593942]
We develop a method to address unsupervised domain adaptation (UDA) in a practical setting of continual learning (CL) The goal is to update the model on continually changing domains while preserving domain-specific knowledge to prevent catastrophic forgetting of past-seen domains.
arXiv Detail & Related papers (2023-04-15T13:44:58Z)
Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation [14.473807945791132]
Continual Test-Time Adaptation (CTTA) aims to adapt the source model to continually changing unlabeled target domains without access to the source data. Motivated by the prompt learning in NLP, in this paper we propose to learn an image-level visual domain prompt for target domains while having the source model parameters frozen.
arXiv Detail & Related papers (2022-12-08T08:56:02Z)
Normalization Perturbation: A Simple Domain Generalization Method for Real-World Domain Shifts [133.99270341855728]
Real-world domain styles can vary substantially due to environment changes and sensor noises. Deep models only know the training domain style. We propose Normalization Perturbation to overcome this domain style overfitting problem.
arXiv Detail & Related papers (2022-11-08T17:36:49Z)
Variational Model Perturbation for Source-Free Domain Adaptation [64.98560348412518]
We introduce perturbations into the model parameters by variational Bayesian inference in a probabilistic framework. We demonstrate the theoretical connection to learning Bayesian neural networks, which proves the generalizability of the perturbed model to target domains.
arXiv Detail & Related papers (2022-10-19T08:41:19Z)
Neural Supervised Domain Adaptation by Augmenting Pre-trained Models with Random Units [14.183224769428843]
Neural Transfer Learning (TL) is becoming ubiquitous in Natural Language Processing (NLP) In this paper, we show through interpretation methods that such scheme, despite its efficiency, is suffering from a main limitation. We propose to augment the pre-trained model with normalised, weighted and randomly initialised units that foster a better adaptation while maintaining the valuable source knowledge.
arXiv Detail & Related papers (2021-06-09T09:29:11Z)
Source-Free Open Compound Domain Adaptation in Semantic Segmentation [99.82890571842603]
In SF-OCDA, only the source pre-trained model and the target data are available to learn the target model. We propose the Cross-Patch Style Swap (CPSS) to diversify samples with various patch styles in the feature-level. Our method produces state-of-the-art results on the C-Driving dataset.
arXiv Detail & Related papers (2021-06-07T08:38:41Z)
UDALM: Unsupervised Domain Adaptation through Language Modeling [79.73916345178415]
We introduce UDALM, a fine-tuning procedure, using a mixed classification and Masked Language Model loss. Our experiments show that performance of models trained with the mixed loss scales with the amount of available target data can be effectively used as a stopping criterion. Our method is evaluated on twelve domain pairs of the Amazon Reviews Sentiment dataset, yielding $91.74%$ accuracy, which is an $1.11%$ absolute improvement over the state-of-versathe-art.
arXiv Detail & Related papers (2021-04-14T19:05:01Z)
Iterative Domain-Repaired Back-Translation [50.32925322697343]
In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent. We propose a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair model to refine translations in synthetic bilingual data. Experiments on adapting NMT models between specific domains and from the general domain to specific domains demonstrate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2020-10-06T04:38:09Z)
Neuron Linear Transformation: Modeling the Domain Shift for Crowd Counting [34.560447389853614]
Cross-domain crowd counting (CDCC) is a hot topic due to its importance in public safety. We propose a Neuron Linear Transformation (NLT) method, exploiting domain factor and bias weights to learn the domain shift. Extensive experiments and analysis on six real-world datasets validate that NLT achieves top performance.
arXiv Detail & Related papers (2020-04-05T09:15:47Z)
A Simple Baseline to Semi-Supervised Domain Adaptation for Machine Translation [73.3550140511458]
State-of-the-art neural machine translation (NMT) systems are data-hungry and perform poorly on new domains with no supervised data. We propose a simple but effect approach to the semi-supervised domain adaptation scenario of NMT. This approach iteratively trains a Transformer-based NMT model via three training objectives: language modeling, back-translation, and supervised translation.
arXiv Detail & Related papers (2020-01-22T16:42:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.