Related papers: Continual Pre-training of Language Models

Continual Pre-training of Language Models

URL: http://arxiv.org/abs/2302.03241v4
Date: Wed, 12 Apr 2023 10:36:44 GMT
Title: Continual Pre-training of Language Models
Authors: Zixuan Ke, Yijia Shao, Haowei Lin, Tatsuya Konishi, Gyuhak Kim, and Bing Liu
Abstract summary: Existing research has shown that further pre-training an LM using a domain corpus to adapt the LM to the domain can improve the end-task performance in the domain. This paper proposes a novel method to continually DAP-train an LM with a sequence of unlabeled domain corpora to adapt the LM to these domains to improve their end-task performances.
Score: 11.59945701446951
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Language models (LMs) have been instrumental for the rapid advance of natural language processing. This paper studies continual pre-training of LMs, in particular, continual domain-adaptive pre-training (or continual DAP-training). Existing research has shown that further pre-training an LM using a domain corpus to adapt the LM to the domain can improve the end-task performance in the domain. This paper proposes a novel method to continually DAP-train an LM with a sequence of unlabeled domain corpora to adapt the LM to these domains to improve their end-task performances. The key novelty of our method is a soft-masking mechanism that directly controls the update to the LM. A novel proxy is also proposed to preserve the general knowledge in the original LM. Additionally, it contrasts the representations of the previously learned domain knowledge (including the general knowledge in the pre-trained LM) and the knowledge from the current full network to achieve knowledge integration. The method not only overcomes catastrophic forgetting, but also achieves knowledge transfer to improve end-task performances. Empirical evaluation demonstrates the effectiveness of the proposed method.

Related papers

InfoSteer: Steering Information Utility in Language Model Post-Training [7.756342860929851]
We present a lightweight method that encourages parametric information utilization in language models (LMs) during post-training.<n>We find this simple guidance delivers consistent performance improvements across diverse model families--including Qwen, Gemma and Llama.<n>Our work underscores that vanilla post-training does not fully leverage pre-training potential, and steering LMs in latent representation space offers a promising approach.
arXiv Detail & Related papers (2025-07-07T16:13:21Z)
LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z)
Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment [120.06538000214552]
Adapting general large language models (LLMs) to specialized domains presents great challenges due to varied data distributions. We propose a new domain adaptation framework including domain knowledge learning and general format alignment, called Mix-CPT. Our proposed Mix-CPT framework can simultaneously improve the task-solving capabilities of LLMs on the target and general domains.
arXiv Detail & Related papers (2024-07-15T15:20:13Z)
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario. To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability. This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability. Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead. We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z)
FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning [54.9235160379917]
FusDom is a simple and novel methodology for SSL-based continued pre-training. FusDom learns speech representations that are robust and adaptive yet not forgetful of concepts seen in the past.
arXiv Detail & Related papers (2023-12-20T13:50:05Z)
Evolving Domain Adaptation of Pretrained Language Models for Text Classification [24.795214770636534]
Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection. This study benchmarks the effectiveness of evolving domain adaptation (EDA) strategies, notably self-training, domain-adversarial training, and domain-adaptive pretraining, with a focus on an incremental self-training method.
arXiv Detail & Related papers (2023-11-16T08:28:00Z)
Propagating Knowledge Updates to LMs Through Distillation [97.3628651636153]
We show that a context-based approach can both impart knowledge about entities and propagate that knowledge to enable broader inferences. Our experiments demonstrate that this approach is more effective at propagating knowledge updates than fine-tuning and other gradient-based knowledge-editing methods.
arXiv Detail & Related papers (2023-06-15T17:39:50Z)
Adapting a Language Model While Preserving its General Knowledge [22.083108548675494]
Domain-adaptive pre-training (or DA-training for short) aims to train a pre-trained general-purpose language model (LM) using an unlabeled corpus of a particular domain to adapt the LM. Existing DA-training methods are in some sense blind as they do not explicitly identify what knowledge in the LM should be preserved and what should be changed by the domain corpus. This paper shows that the existing methods are suboptimal and proposes a novel method to perform a more informed adaptation of the knowledge in the LM.
arXiv Detail & Related papers (2023-01-21T17:57:53Z)
On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey [15.533482481757353]
We propose a taxonomy of domain adaptation approaches from a machine learning system view. We discuss and compare those methods and suggest promising future research directions.
arXiv Detail & Related papers (2022-11-06T15:32:00Z)
Continual Training of Language Models for Few-Shot Learning [20.840674614655942]
Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications. Adapting or posttraining an LM using an unlabeled domain corpus can produce even better performance for end-tasks in the domain. This paper proposes the problem of continually extending an LM by incrementally post-train the LM with a sequence of unlabeled domain corpora. The resulting system is called CPT (Continual PostTraining), which to our knowledge, is the first continual post-training system.
arXiv Detail & Related papers (2022-10-11T15:43:58Z)
KALA: Knowledge-Augmented Language Model Adaptation [65.92457495576141]
We propose a novel domain adaption framework for pre-trained language models (PLMs) Knowledge-Augmented Language model Adaptation (KALA) modulates the intermediate hidden representations of PLMs with domain knowledge. Results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training.
arXiv Detail & Related papers (2022-04-22T08:11:59Z)
Multi-Stage Pre-training for Low-Resource Domain Adaptation [24.689862495171408]
Current approaches directly adapt a pre-trained language model (LM) on in-domain text before fine-tuning to downstream tasks. We show that extending the vocabulary of the LM with domain-specific terms leads to further gains. We apply these approaches incrementally on a pre-trained Roberta-large LM and show considerable performance gain on three tasks in the IT domain.
arXiv Detail & Related papers (2020-10-12T17:57:00Z)
Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM) We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior. Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.