Examining Forgetting in Continual Pre-training of Aligned Large Language
Models
- URL: http://arxiv.org/abs/2401.03129v1
- Date: Sat, 6 Jan 2024 05:34:09 GMT
- Title: Examining Forgetting in Continual Pre-training of Aligned Large Language
Models
- Authors: Chen-An Li, Hung-Yi Lee
- Abstract summary: We investigate the phenomenon of forgetting that occurs during continual pre-training on an existing fine-tuned LLM.
Experiment results highlight the non-trivial challenge of addressing catastrophic forgetting during continual pre-training.
- Score: 66.62800021628276
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent advances in Large Language Models (LLMs) have exhibited remarkable
proficiency across various tasks. Given the potent applications of LLMs in
numerous fields, there has been a surge in LLM development. In developing LLMs,
a common practice involves continual pre-training on previously fine-tuned
models. However, this can lead to catastrophic forgetting. In our work, we
investigate the phenomenon of forgetting that occurs during continual
pre-training on an existing fine-tuned LLM. We evaluate the impact of
continuous pre-training on the fine-tuned LLM across various dimensions,
including output format, knowledge, and reliability. Experiment results
highlight the non-trivial challenge of addressing catastrophic forgetting
during continual pre-training, especially the repetition issue.
Related papers
- Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models [28.253786579346432]
Large language models (LLMs) have revolutionized Natural Language Processing (NLP)
Currently solutions toward long context modeling often employ multi-stage continual pertaining.
In this paper, we introduce a novel single-stage continual pretraining method, Head-Adaptive Rotary Position.
arXiv Detail & Related papers (2024-12-10T04:09:29Z) - Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks.
To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z) - Exploring Forgetting in Large Language Model Pre-Training [18.858330348834777]
Catastrophic forgetting remains a formidable obstacle to building an omniscient model in large language models (LLMs)
We systematically explored the existence and measurement of forgetting in pre-training, questioning traditional metrics such as perplexity (PPL) and introducing new metrics to better detect entity memory retention.
arXiv Detail & Related papers (2024-10-22T13:39:47Z) - Continual Learning of Large Language Models: A Comprehensive Survey [18.546766135948154]
Large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications.
One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences.
While extensively studied in the continual learning (CL) community, it presents new manifestations in the realm of LLMs.
arXiv Detail & Related papers (2024-04-25T17:38:57Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - A Survey of Confidence Estimation and Calibration in Large Language Models [86.692994151323]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks in various domains.
Despite their impressive performance, they can be unreliable due to factual errors in their generations.
Assessing their confidence and calibrating them across different tasks can help mitigate risks and enable LLMs to produce better generations.
arXiv Detail & Related papers (2023-11-14T16:43:29Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning [70.48605869773814]
Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned information.
This study empirically evaluates the forgetting phenomenon in large language models during continual instruction tuning.
arXiv Detail & Related papers (2023-08-17T02:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.