Examining Forgetting in Continual Pre-training of Aligned Large Language
Models
- URL: http://arxiv.org/abs/2401.03129v1
- Date: Sat, 6 Jan 2024 05:34:09 GMT
- Title: Examining Forgetting in Continual Pre-training of Aligned Large Language
Models
- Authors: Chen-An Li, Hung-Yi Lee
- Abstract summary: We investigate the phenomenon of forgetting that occurs during continual pre-training on an existing fine-tuned LLM.
Experiment results highlight the non-trivial challenge of addressing catastrophic forgetting during continual pre-training.
- Score: 66.62800021628276
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent advances in Large Language Models (LLMs) have exhibited remarkable
proficiency across various tasks. Given the potent applications of LLMs in
numerous fields, there has been a surge in LLM development. In developing LLMs,
a common practice involves continual pre-training on previously fine-tuned
models. However, this can lead to catastrophic forgetting. In our work, we
investigate the phenomenon of forgetting that occurs during continual
pre-training on an existing fine-tuned LLM. We evaluate the impact of
continuous pre-training on the fine-tuned LLM across various dimensions,
including output format, knowledge, and reliability. Experiment results
highlight the non-trivial challenge of addressing catastrophic forgetting
during continual pre-training, especially the repetition issue.
Related papers
- Exploring Forgetting in Large Language Model Pre-Training [18.858330348834777]
Catastrophic forgetting remains a formidable obstacle to building an omniscient model in large language models (LLMs)
We systematically explored the existence and measurement of forgetting in pre-training, questioning traditional metrics such as perplexity (PPL) and introducing new metrics to better detect entity memory retention.
arXiv Detail & Related papers (2024-10-22T13:39:47Z) - Zero-shot Model-based Reinforcement Learning using Large Language Models [12.930241182192988]
We investigate how pre-trained Large Language Models can be leveraged to predict in context the dynamics of continuous Markov decision processes.
We present proof-of-concept applications in two reinforcement learning settings: model-based policy evaluation and data-augmented off-policy reinforcement learning.
arXiv Detail & Related papers (2024-10-15T15:46:53Z) - MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models.
MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z) - Continual Learning of Large Language Models: A Comprehensive Survey [18.546766135948154]
Large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications.
One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences.
While extensively studied in the continual learning (CL) community, it presents new manifestations in the realm of LLMs.
arXiv Detail & Related papers (2024-04-25T17:38:57Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - A Survey of Confidence Estimation and Calibration in Large Language Models [86.692994151323]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks in various domains.
Despite their impressive performance, they can be unreliable due to factual errors in their generations.
Assessing their confidence and calibrating them across different tasks can help mitigate risks and enable LLMs to produce better generations.
arXiv Detail & Related papers (2023-11-14T16:43:29Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.