CALM: Continuous Adaptive Learning for Language Modeling
- URL: http://arxiv.org/abs/2004.03794v1
- Date: Wed, 8 Apr 2020 03:51:17 GMT
- Title: CALM: Continuous Adaptive Learning for Language Modeling
- Authors: Kristjan Arumae and Parminder Bhatia
- Abstract summary: Training large language representation models has become a standard in the natural language processing community.
We demonstrate that in practice these pre-trained models present performance deterioration in the form of catastrophic forgetting.
We propose CALM, Continuous Adaptive Learning for Language Modeling: techniques to render models which retain knowledge across multiple domains.
- Score: 18.72860206714457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training large language representation models has become a standard in the
natural language processing community. This allows for fine tuning on any
number of specific tasks, however, these large high capacity models can
continue to train on domain specific unlabeled data to make initialization even
more robust for supervised tasks. We demonstrate that in practice these
pre-trained models present performance deterioration in the form of
catastrophic forgetting when evaluated on tasks from a general domain such as
GLUE. In this work we propose CALM, Continuous Adaptive Learning for Language
Modeling: techniques to render models which retain knowledge across multiple
domains. With these methods, we are able to reduce the performance gap across
supervised tasks introduced by task specific models which we demonstrate using
a continual learning setting in biomedical and clinical domains.
Related papers
- SwitchCIT: Switching for Continual Instruction Tuning of Large Language Models [14.085371250265224]
Large language models (LLMs) have exhibited impressive capabilities in various domains, particularly in general language understanding.
However these models, trained on massive text data, may not be finely optimized for specific tasks triggered by instructions.
This work addresses the catastrophic forgetting in continual instruction learning for LLMs through a switching mechanism for routing computations to parameter-efficient tuned models.
arXiv Detail & Related papers (2024-07-16T14:37:33Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Adapting Large Language Models for Content Moderation: Pitfalls in Data
Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains.
In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Unsupervised Calibration through Prior Adaptation for Text
Classification using Large Language Models [37.39843935632105]
We propose an approach to adapt the prior class distribution to perform text classification tasks without the need for labelled samples.
Results show that these methods outperform the un-adapted model for different number of training shots in the prompt.
arXiv Detail & Related papers (2023-07-13T12:11:36Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - An Exploration of Prompt Tuning on Generative Spoken Language Model for
Speech Processing Tasks [112.1942546460814]
We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM)
Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
arXiv Detail & Related papers (2022-03-31T03:26:55Z) - Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning [65.268245109828]
In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models.
Deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning.
Model reprogramming enables resource-efficient cross-domain machine learning by repurposing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning.
arXiv Detail & Related papers (2022-02-22T02:33:54Z) - CLIN-X: pre-trained language models and a study on cross-task transfer
for concept extraction in the clinical domain [22.846469609263416]
We introduce the pre-trained CLIN-X (Clinical XLM-R) language models and show how CLIN-X outperforms other pre-trained transformer models.
Our studies reveal stable model performance despite a lack of annotated data with improvements of up to 47 F1 points when only 250 labeled sentences are available.
Our results highlight the importance of specialized language models as CLIN-X for concept extraction in non-standard domains.
arXiv Detail & Related papers (2021-12-16T10:07:39Z) - Adapt-and-Distill: Developing Small, Fast and Effective Pretrained
Language Models for Domains [45.07506437436464]
We present a general approach to developing small, fast and effective pre-trained models for specific domains.
This is achieved by adapting the off-the-shelf general pre-trained models and performing task-agnostic knowledge distillation in target domains.
arXiv Detail & Related papers (2021-06-25T07:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.