An Empirical Investigation Towards Efficient Multi-Domain Language Model
Pre-training
- URL: http://arxiv.org/abs/2010.00784v1
- Date: Thu, 1 Oct 2020 09:20:18 GMT
- Title: An Empirical Investigation Towards Efficient Multi-Domain Language Model
Pre-training
- Authors: Kristjan Arumae, Qing Sun, and Parminder Bhatia
- Abstract summary: We conduct an empirical investigation into known methods to mitigate catastrophic forgetting (CF)
We find that elastic weight consolidation provides best overall scores yielding only a 0.33% drop in performance across seven generic tasks.
- Score: 15.440627147018711
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training large language models has become a standard in the natural
language processing community. Such models are pre-trained on generic data
(e.g. BookCorpus and English Wikipedia) and often fine-tuned on tasks in the
same domain. However, in order to achieve state-of-the-art performance on out
of domain tasks such as clinical named entity recognition and relation
extraction, additional in domain pre-training is required. In practice, staged
multi-domain pre-training presents performance deterioration in the form of
catastrophic forgetting (CF) when evaluated on a generic benchmark such as
GLUE. In this paper we conduct an empirical investigation into known methods to
mitigate CF. We find that elastic weight consolidation provides best overall
scores yielding only a 0.33% drop in performance across seven generic tasks
while remaining competitive in bio-medical tasks. Furthermore, we explore
gradient and latent clustering based data selection techniques to improve
coverage when using elastic weight consolidation and experience replay methods.
Related papers
- Generalization Capabilities of Neural Cellular Automata for Medical Image Segmentation: A Robust and Lightweight Approach [6.537479355990391]
U-Nets exhibit a significant decline in performance when tested on data that deviates from the training distribution.
This paper investigates the implications of utilizing models that are smaller by three orders of magnitude (i.e., x1000) compared to a conventional U-Net.
arXiv Detail & Related papers (2024-08-28T06:18:55Z) - Self-Train Before You Transcribe [3.17829719401032]
We investigate the benefit of performing noisy student teacher training on recordings in the test set as a test-time adaptation approach.
A range of in-domain and out-of-domain datasets are used for experiments demonstrating large relative gains of up to 32.2%.
arXiv Detail & Related papers (2024-06-17T09:21:00Z) - Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Test-Time Training for Semantic Segmentation with Output Contrastive
Loss [12.535720010867538]
Deep learning-based segmentation models have achieved impressive performance on public benchmarks, but generalizing well to unseen environments remains a major challenge.
This paper introduces Contrastive Loss (OCL), known for its capability to learn robust and generalized representations, to stabilize the adaptation process.
Our method excels even when applied to models initially pre-trained using domain adaptation methods on test domain data, showcasing its resilience and adaptability.
arXiv Detail & Related papers (2023-11-14T03:13:47Z) - Federated Meta-Learning for Few-Shot Fault Diagnosis with Representation
Encoding [21.76802204235636]
We propose representation encoding-based federated meta-learning (REFML) for few-shot fault diagnosis.
REFML harnesses the inherent generalization among training clients, effectively transforming it into an advantage for out-of-distribution.
It achieves an increase in accuracy by 2.17%-6.50% when tested on unseen working conditions of the same equipment type and 13.44%-18.33% when tested on totally unseen equipment types.
arXiv Detail & Related papers (2023-10-13T10:48:28Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - Improving Domain Generalization with Domain Relations [77.63345406973097]
This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on.
We propose a new approach called D$3$G to learn domain-specific models.
Our results show that D$3$G consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-06T08:11:16Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.