Continual Training of Language Models for Few-Shot Learning
- URL: http://arxiv.org/abs/2210.05549v1
- Date: Tue, 11 Oct 2022 15:43:58 GMT
- Title: Continual Training of Language Models for Few-Shot Learning
- Authors: Zixuan Ke, Haowei Lin, Yijia Shao, Hu Xu, Lei Shu, and Bing Liu
- Abstract summary: Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications.
Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain.
This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora.
The resulting system is called CPT (Continual PostTraining), which to our knowledge, is the first continual post-training system.
- Score: 20.840674614655942
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Recent work on applying large language models (LMs) achieves impressive
performance in many NLP applications. Adapting or posttraining an LM using an
unlabeled domain corpus can produce even better performance for end-tasks in
the domain. This paper proposes the problem of continually extending an LM by
incrementally post-train the LM with a sequence of unlabeled domain corpora to
expand its knowledge without forgetting its previous skills. The goal is to
improve the few-shot end-task learning in these domains. The resulting system
is called CPT (Continual PostTraining), which to our knowledge, is the first
continual post-training system. Experimental results verify its effectiveness.
Related papers
- TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - Fine-tuning Large Language Models for Domain-specific Machine
Translation [8.439661191792897]
Large language models (LLMs) have made significant progress in machine translation (MT)
However, their potential in domain-specific MT remains under-explored.
This paper proposes a prompt-oriented fine-tuning method, denoted as LlamaIT, to effectively and efficiently fine-tune a general-purpose LLM for domain-specific MT tasks.
arXiv Detail & Related papers (2024-02-23T02:24:15Z) - Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models [21.95081572612883]
Most assume that catastrophic forgetting is the biggest obstacle to achieving superior IL performance.
We propose a frustratingly easy method called SEQ* for IL with PLMs.
Results show that SEQ* has competitive or superior performance compared to state-of-the-art (SOTA) IL methods.
arXiv Detail & Related papers (2023-12-13T04:14:22Z) - Improving Language Plasticity via Pretraining with Active Forgetting [63.36484652568976]
We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages.
Experiments with RoBERTa show that models pretrained with our forgetting mechanism demonstrate faster convergence during language adaptation.
arXiv Detail & Related papers (2023-07-03T17:12:44Z) - Continual Pre-training of Language Models [11.59945701446951]
Existing research has shown that further pre-training an LM using a domain corpus to adapt the LM to the domain can improve the end-task performance in the domain.
This paper proposes a novel method to continually DAP-train an LM with a sequence of unlabeled domain corpora to adapt the LM to these domains to improve their end-task performances.
arXiv Detail & Related papers (2023-02-07T03:57:55Z) - Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks.
It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned.
We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z) - KALA: Knowledge-Augmented Language Model Adaptation [65.92457495576141]
We propose a novel domain adaption framework for pre-trained language models (PLMs)
Knowledge-Augmented Language model Adaptation (KALA) modulates the intermediate hidden representations of PLMs with domain knowledge.
Results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training.
arXiv Detail & Related papers (2022-04-22T08:11:59Z) - ELLE: Efficient Lifelong Pre-training for Emerging Data [91.52652408402815]
Current pre-trained language models (PLM) are typically trained with static data, ignoring that in real-world scenarios, streaming data of various sources may continuously grow.
We propose ELLE, aiming at efficient lifelong pre-training for emerging data.
ELLE consists of (1) function preserved model expansion, which flexibly expands an existing PLM's width and depth to improve the efficiency of knowledge acquisition; and (2) pre-trained domain prompts, which disentangle the versatile knowledge learned during pre-training and stimulate the proper knowledge for downstream tasks.
arXiv Detail & Related papers (2022-03-12T01:53:53Z) - Multi-Stage Pre-training for Low-Resource Domain Adaptation [24.689862495171408]
Current approaches directly adapt a pre-trained language model (LM) on in-domain text before fine-tuning to downstream tasks.
We show that extending the vocabulary of the LM with domain-specific terms leads to further gains.
We apply these approaches incrementally on a pre-trained Roberta-large LM and show considerable performance gain on three tasks in the IT domain.
arXiv Detail & Related papers (2020-10-12T17:57:00Z) - Feature Adaptation of Pre-Trained Language Models across Languages and
Domains with Robust Self-Training [47.12438995938133]
We adapt pre-trained language models (PrLMs) to new domains without fine-tuning.
We present class-aware feature self-distillation (CFd) to learn discriminative features from PrLMs.
Experiments on two monolingual and multilingual Amazon review datasets show that CFd can consistently improve the performance of self-training.
arXiv Detail & Related papers (2020-09-24T08:04:37Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.