Investigating Continual Pretraining in Large Language Models: Insights and Implications
- URL: http://arxiv.org/abs/2402.17400v2
- Date: Wed, 12 Feb 2025 14:46:43 GMT
- Title: Investigating Continual Pretraining in Large Language Models: Insights and Implications
- Authors: Çağatay Yıldız, Nishaanth Kanna Ravichandran, Nitin Sharma, Matthias Bethge, Beyza Ermis,
- Abstract summary: Continual learning in large language models (LLMs) is an evolving domain that focuses on developing efficient and sustainable training strategies.
We introduce a new benchmark designed to measure the adaptability of LLMs to changing pretraining data landscapes.
Our findings uncover several key insights: (i) continual pretraining consistently improves 1.5B models studied in this work and is also superior to domain adaptation, (ii) larger models always achieve better perplexity than smaller ones when continually pretrained on the same corpus, (iii) smaller models are particularly sensitive to continual pretraining, showing the most significant rates of both learning and
- Score: 9.660013084324817
- License:
- Abstract: Continual learning (CL) in large language models (LLMs) is an evolving domain that focuses on developing efficient and sustainable training strategies to adapt models to emerging knowledge and achieve robustness in dynamic environments. Our primary emphasis is on continual domain-adaptive pretraining, a process designed to equip LLMs with the ability to integrate new information from various domains while retaining previously learned knowledge. Since existing works concentrate mostly on continual fine-tuning for a limited selection of downstream tasks or training domains, we introduce a new benchmark designed to measure the adaptability of LLMs to changing pretraining data landscapes. We further examine the impact of model size on learning efficacy and forgetting, as well as how the progression and similarity of emerging domains affect the knowledge transfer within these models. Our findings uncover several key insights: (i) continual pretraining consistently improves <1.5B models studied in this work and is also superior to domain adaptation, (ii) larger models always achieve better perplexity than smaller ones when continually pretrained on the same corpus, (iii) smaller models are particularly sensitive to continual pretraining, showing the most significant rates of both learning and forgetting, (iv) continual pretraining boosts downstream task performance of GPT-2 family, (v) continual pretraining enables LLMs to specialize better when the sequence of domains shows semantic similarity while randomizing training domains leads to better transfer and final performance otherwise. We posit that our research establishes a new benchmark for CL in LLMs, providing a more realistic evaluation of knowledge retention and transfer across diverse domains.
Related papers
- Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance.
We introduce novel algorithms for dynamic, instance-level data reweighting.
Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z) - Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA [19.982853959240497]
Existing methods often rely on additional reference data, isolated components for distribution or domain predictions.
We propose Dynamic Rank-Selective Low Rank Adaptation (LoRA), a universal and efficient continual learning approach.
Our approach continually enhances the pre-trained VLM by retaining both the pre-trained knowledge and the knowledge acquired during CL.
arXiv Detail & Related papers (2024-12-01T23:41:42Z) - VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs [38.65649832364651]
VersaTune is a novel data composition framework designed for enhancing Large Language Models' multi-ability performances during training.
We categorize knowledge into distinct domains including law, medicine, finance, science, code, etc.
We demonstrate that VersaTune achieves significant improvements in multi-domain performance, with a 35.21% enhancement in comprehensive multi-domain tasks.
arXiv Detail & Related papers (2024-11-18T03:45:34Z) - Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow.
We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z) - CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning [17.614980614656407]
We propose Continual Generative training for Incremental prompt-Learning.
We exploit Variational Autoencoders to learn class-conditioned distributions.
We show that such a generative replay approach can adapt to new tasks while improving zero-shot capabilities.
arXiv Detail & Related papers (2024-07-22T16:51:28Z) - Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning [13.371405067535814]
This paper investigates the effectiveness ofSupervised Fine-Tuning (SFT) as a method for knowledge injection in Large Language Models (LLMs)
We compare different dataset generation strategies -- token-based and fact-based scaling -- to create training data that helps the model learn new information.
Our results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge.
arXiv Detail & Related papers (2024-03-30T01:56:07Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models
with Semi-structured Data [67.8302955948861]
Large Language Models (LLMs) pre-trained on massive corpora have exhibited remarkable performance on various NLP tasks.
Applying these models to specific domains still poses significant challenges, such as lack of domain knowledge.
We focus on domain-specific continual pre-training of LLMs using E-commerce domain as an exemplar.
arXiv Detail & Related papers (2023-12-25T11:31:47Z) - Incremental Learning for Heterogeneous Structure Segmentation in Brain
Tumor MRI [11.314017805825685]
We propose a divergence-aware dual-flow module with balanced rigidity and plasticity branches to decouple old and new tasks.
We evaluate our framework on a brain tumor segmentation task with continually changing target domains.
arXiv Detail & Related papers (2023-05-30T20:39:03Z) - Forget Less, Count Better: A Domain-Incremental Self-Distillation
Learning Benchmark for Lifelong Crowd Counting [51.44987756859706]
Off-the-shelf methods have some drawbacks to handle multiple domains.
Lifelong Crowd Counting aims at alleviating the catastrophic forgetting and improving the generalization ability.
arXiv Detail & Related papers (2022-05-06T15:37:56Z) - Unified Instance and Knowledge Alignment Pretraining for Aspect-based
Sentiment Analysis [96.53859361560505]
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect.
There always exists severe domain shift between the pretraining and downstream ABSA datasets.
We introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline.
arXiv Detail & Related papers (2021-10-26T04:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.