Exploring Forgetting in Large Language Model Pre-Training
- URL: http://arxiv.org/abs/2410.17018v1
- Date: Tue, 22 Oct 2024 13:39:47 GMT
- Title: Exploring Forgetting in Large Language Model Pre-Training
- Authors: Chonghua Liao, Ruobing Xie, Xingwu Sun, Haowen Sun, Zhanhui Kang,
- Abstract summary: Catastrophic forgetting remains a formidable obstacle to building an omniscient model in large language models (LLMs)
We systematically explored the existence and measurement of forgetting in pre-training, questioning traditional metrics such as perplexity (PPL) and introducing new metrics to better detect entity memory retention.
- Score: 18.858330348834777
- License:
- Abstract: Catastrophic forgetting remains a formidable obstacle to building an omniscient model in large language models (LLMs). Despite the pioneering research on task-level forgetting in LLM fine-tuning, there is scant focus on forgetting during pre-training. We systematically explored the existence and measurement of forgetting in pre-training, questioning traditional metrics such as perplexity (PPL) and introducing new metrics to better detect entity memory retention. Based on our revised assessment of forgetting metrics, we explored low-cost, straightforward methods to mitigate forgetting during the pre-training phase. Further, we carefully analyzed the learning curves, offering insights into the dynamics of forgetting. Extensive evaluations and analyses on forgetting of pre-training could facilitate future research on LLMs.
Related papers
- Dynamic Uncertainty Ranking: Enhancing In-Context Learning for Long-Tail Knowledge in LLMs [50.29035873837]
Large language models (LLMs) can learn vast amounts of knowledge from diverse domains during pre-training.
Long-tail knowledge from specialized domains is often scarce and underrepresented, rarely appearing in the models' memorization.
We propose a reinforcement learning-based dynamic uncertainty ranking method for ICL that accounts for the varying impact of each retrieved sample on LLM predictions.
arXiv Detail & Related papers (2024-10-31T03:42:17Z) - Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate [118.37653302885607]
We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs)
MIR is indicative about training data selection, training strategy schedule, and model architecture design to get better pre-training results.
arXiv Detail & Related papers (2024-10-09T17:59:04Z) - Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale [18.015805664219673]
We explore an alternative approach to constructing an Large Language Model by continually pretraining (CPT) from existing pretrained LLMs.
We find that CPT converges faster and saves significant resources in a scalable manner.
The effectiveness of transfer at scale is influenced by training duration and linguistic properties, while robust to data replaying.
arXiv Detail & Related papers (2024-07-02T10:06:41Z) - Learn When (not) to Trust Language Models: A Privacy-Centric Adaptive Model-Aware Approach [23.34505448257966]
Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks.
Previous work has proposed to determine when to do/skip the retrieval in a data-aware manner by analyzing the LLMs' pretraining data.
These data-aware methods pose privacy risks and memory limitations, especially when requiring access to sensitive or extensive pretraining data.
We hypothesize that token embeddings are able to capture the model's intrinsic knowledge, which offers a safer and more straightforward way to judge the need for retrieval without the privacy risks associated with accessing pre-training data.
arXiv Detail & Related papers (2024-04-04T15:21:22Z) - The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis [27.310894780313618]
This paper undertakes a comprehensive comparison of model capabilities at various pretraining intermediate checkpoints.
We confirm that specific downstream metrics exhibit similar training dynamics across models of different sizes.
In addition to our core findings, we've reproduced Amber and OpenLLaMA, releasing their intermediate checkpoints.
arXiv Detail & Related papers (2024-04-01T16:00:01Z) - Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models [47.439995799065755]
We pioneer the exploration of LLM's trustworthiness during pre-training.
We focus on five key dimensions: reliability, privacy, toxicity, fairness, and robustness.
We are the first to observe a similar two-phase phenomenon: fitting and compression.
arXiv Detail & Related papers (2024-02-29T18:55:06Z) - Machine Unlearning of Pre-trained Large Language Models [17.40601262379265]
This study investigates the concept of the right to be forgotten' within the context of large language models (LLMs)
We explore machine unlearning as a pivotal solution, with a focus on pre-trained models.
arXiv Detail & Related papers (2024-02-23T07:43:26Z) - Few-Shot Class-Incremental Learning with Prior Knowledge [94.95569068211195]
We propose Learning with Prior Knowledge (LwPK) to enhance the generalization ability of the pre-trained model.
Experimental results indicate that LwPK effectively enhances the model resilience against catastrophic forgetting.
arXiv Detail & Related papers (2024-02-02T08:05:35Z) - Examining Forgetting in Continual Pre-training of Aligned Large Language
Models [66.62800021628276]
We investigate the phenomenon of forgetting that occurs during continual pre-training on an existing fine-tuned LLM.
Experiment results highlight the non-trivial challenge of addressing catastrophic forgetting during continual pre-training.
arXiv Detail & Related papers (2024-01-06T05:34:09Z) - Knowledge Editing for Large Language Models: A Survey [51.01368551235289]
One major drawback of large language models (LLMs) is their substantial computational cost for pre-training.
Knowledge-based Model Editing (KME) has attracted increasing attention, which aims to precisely modify the LLMs to incorporate specific knowledge.
arXiv Detail & Related papers (2023-10-24T22:18:13Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.