Related papers: Investigating Continual Pretraining in Large Language Models: Insights and Implications

Investigating Continual Pretraining in Large Language Models: Insights and Implications

URL: http://arxiv.org/abs/2402.17400v1
Date: Tue, 27 Feb 2024 10:47:24 GMT
Title: Investigating Continual Pretraining in Large Language Models: Insights and Implications
Authors: \c{C}a\u{g}atay Y{\i}ld{\i}z, Nishaanth Kanna Ravichandran, Prishruit Punia, Matthias Bethge, Beyza Ermis
Abstract summary: This paper studies the evolving domain of Continual Learning in large language models (LLMs) Our primary emphasis is on continual domain-adaptive pretraining, a process designed to equip LLMs with the ability to integrate new information from various domains. We examine the impact of model size on learning efficacy and forgetting, as well as how the progression and similarity of emerging domains affect the knowledge transfer within these models.
Score: 9.591223887442704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the evolving domain of Continual Learning (CL) in large language models (LLMs), with a focus on developing strategies for efficient and sustainable training. Our primary emphasis is on continual domain-adaptive pretraining, a process designed to equip LLMs with the ability to integrate new information from various domains while retaining previously learned knowledge and enhancing cross-domain knowledge transfer without relying on domain-specific identification. Unlike previous studies, which mostly concentrate on a limited selection of tasks or domains and primarily aim to address the issue of forgetting, our research evaluates the adaptability and capabilities of LLMs to changing data landscapes in practical scenarios. To this end, we introduce a new benchmark designed to measure the adaptability of LLMs to these evolving data environments, offering a comprehensive framework for evaluation. We examine the impact of model size on learning efficacy and forgetting, as well as how the progression and similarity of emerging domains affect the knowledge transfer within these models. Our findings uncover several key insights: (i) when the sequence of domains shows semantic similarity, continual pretraining enables LLMs to better specialize in the current domain compared to stand-alone fine-tuning, (ii) training across a diverse range of domains enhances both backward and forward knowledge transfer, and (iii) smaller models are particularly sensitive to continual pretraining, showing the most significant rates of both forgetting and learning. We posit that our research marks a shift towards establishing a more realistic benchmark for investigating CL in LLMs, and has the potential to play a key role in guiding the direction of future research in the field.

Related papers

Tackling Distribution Shift in LLM via KILO: Knowledge-Instructed Learning for Continual Adaptation [0.35297361401370037]
Large Language Models (LLMs) often suffer from performance degradation when faced with domain shifts.<n>We propose KILO, a novel continual learning framework that integrates dynamic knowledge graphs with instruction tuning.
arXiv Detail & Related papers (2025-08-05T15:39:37Z)
Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning [11.50324946279326]
Contrastive Language-Image Pre-trained model (CLIP) exhibiting strong capabilities across various downstream tasks.<n>We analyze the variations in the modality gap during the fine-tuning of vision-language pre-trained models.<n>We propose a simple yet effective method, MG-CLIP, that improves CLIP's performance in class-incremental learning.
arXiv Detail & Related papers (2025-07-12T02:28:42Z)
EvoLM: In Search of Lost Language Model Training Dynamics [97.69616550374579]
EvoLM is a model suite that enables systematic and transparent analysis of LMs' training dynamics across pre-training, continued pre-training, supervised fine-tuning, and reinforcement learning.<n>By training over 100 LMs with 1B and 4B parameters from scratch, we rigorously evaluate both upstream (language modeling) and downstream (problem-solving) reasoning capabilities.
arXiv Detail & Related papers (2025-06-19T04:58:47Z)
LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z)
Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance. We introduce novel algorithms for dynamic, instance-level data reweighting. Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z)
Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA [19.982853959240497]
We investigate whether pre-trained knowledge in vision-language models (VLMs) can be retained -- or even enhanced -- in continual learning (CL) We propose a universal and efficient Continual Learning approach for VLM based on Dynamic Rank-Selective LoRA (CoDyRA)
arXiv Detail & Related papers (2024-12-01T23:41:42Z)
VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs [38.65649832364651]
VersaTune is a novel data composition framework designed for enhancing Large Language Models' multi-ability performances during training. We categorize knowledge into distinct domains including law, medicine, finance, science, code, etc. We demonstrate that VersaTune achieves significant improvements in multi-domain performance, with a 35.21% enhancement in comprehensive multi-domain tasks.
arXiv Detail & Related papers (2024-11-18T03:45:34Z)
Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow. We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z)
Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift. We devise a series of experiments to empirically explain the performance gap.
arXiv Detail & Related papers (2024-09-27T05:06:43Z)
Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence [60.37934652213881]
Domain Adaptation (DA) facilitates knowledge transfer from a source domain to a related target domain. This paper investigates a practical DA paradigm, namely Source data-Free Active Domain Adaptation (SFADA), where source data becomes inaccessible during adaptation. We present learn from the learnt (LFTL), a novel paradigm for SFADA to leverage the learnt knowledge from the source pretrained model and actively iterated models without extra overhead.
arXiv Detail & Related papers (2024-07-26T17:51:58Z)
Pretraining and Updates of Domain-Specific LLM: A Case Study in the Japanese Business Domain [4.133477882188227]
This paper presents our findings from training and evaluating a Japanese business domain-specific LLM. Our pretrained model and business domain benchmark are publicly available to support further studies.
arXiv Detail & Related papers (2024-04-12T06:21:48Z)
Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning [13.371405067535814]
This paper investigates the effectiveness ofSupervised Fine-Tuning (SFT) as a method for knowledge injection in Large Language Models (LLMs) We compare different dataset generation strategies -- token-based and fact-based scaling -- to create training data that helps the model learn new information. Our results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge.
arXiv Detail & Related papers (2024-03-30T01:56:07Z)
Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale. This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z)
EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data [67.8302955948861]
Large Language Models (LLMs) pre-trained on massive corpora have exhibited remarkable performance on various NLP tasks. Applying these models to specific domains still poses significant challenges, such as lack of domain knowledge. We focus on domain-specific continual pre-training of LLMs using E-commerce domain as an exemplar.
arXiv Detail & Related papers (2023-12-25T11:31:47Z)
Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer [69.82229895838577]
Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate. This setting neglects the more practical scenario where training data are collected from multiple sources. This motivates us to target a new and challenging setting of knowledge transfer that extends ADA from a single source domain to multiple source domains.
arXiv Detail & Related papers (2023-11-21T13:12:21Z)
A Recent Survey of Heterogeneous Transfer Learning [15.830786437956144]
heterogeneous transfer learning has become a vital strategy in various tasks. We offer an extensive review of over 60 HTL methods, covering both data-based and model-based approaches. We explore applications in natural language processing, computer vision, multimodal learning, and biomedicine.
arXiv Detail & Related papers (2023-10-12T16:19:58Z)
Incremental Learning for Heterogeneous Structure Segmentation in Brain Tumor MRI [11.314017805825685]
We propose a divergence-aware dual-flow module with balanced rigidity and plasticity branches to decouple old and new tasks. We evaluate our framework on a brain tumor segmentation task with continually changing target domains.
arXiv Detail & Related papers (2023-05-30T20:39:03Z)
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey [100.24095818099522]
Large language models (LLMs) have significantly advanced the field of natural language processing (NLP) They provide a highly useful, task-agnostic foundation for a wide range of applications. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles.
arXiv Detail & Related papers (2023-05-30T03:00:30Z)
On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey [15.533482481757353]
We propose a taxonomy of domain adaptation approaches from a machine learning system view. We discuss and compare those methods and suggest promising future research directions.
arXiv Detail & Related papers (2022-11-06T15:32:00Z)
Forget Less, Count Better: A Domain-Incremental Self-Distillation Learning Benchmark for Lifelong Crowd Counting [51.44987756859706]
Off-the-shelf methods have some drawbacks to handle multiple domains. Lifelong Crowd Counting aims at alleviating the catastrophic forgetting and improving the generalization ability.
arXiv Detail & Related papers (2022-05-06T15:37:56Z)
Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis [96.53859361560505]
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect. There always exists severe domain shift between the pretraining and downstream ABSA datasets. We introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline.
arXiv Detail & Related papers (2021-10-26T04:03:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.