Related papers: Examining Forgetting in Continual Pre-training of Aligned Large Language Models

Examining Forgetting in Continual Pre-training of Aligned Large Language Models

URL: http://arxiv.org/abs/2401.03129v1
Date: Sat, 6 Jan 2024 05:34:09 GMT
Title: Examining Forgetting in Continual Pre-training of Aligned Large Language Models
Authors: Chen-An Li, Hung-Yi Lee
Abstract summary: We investigate the phenomenon of forgetting that occurs during continual pre-training on an existing fine-tuned LLM. Experiment results highlight the non-trivial challenge of addressing catastrophic forgetting during continual pre-training.
Score: 66.62800021628276
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recent advances in Large Language Models (LLMs) have exhibited remarkable proficiency across various tasks. Given the potent applications of LLMs in numerous fields, there has been a surge in LLM development. In developing LLMs, a common practice involves continual pre-training on previously fine-tuned models. However, this can lead to catastrophic forgetting. In our work, we investigate the phenomenon of forgetting that occurs during continual pre-training on an existing fine-tuned LLM. We evaluate the impact of continuous pre-training on the fine-tuned LLM across various dimensions, including output format, knowledge, and reliability. Experiment results highlight the non-trivial challenge of addressing catastrophic forgetting during continual pre-training, especially the repetition issue.

Related papers

LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z)
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models [28.253786579346432]
Large language models (LLMs) have revolutionized Natural Language Processing (NLP) Currently solutions toward long context modeling often employ multi-stage continual pertaining. In this paper, we introduce a novel single-stage continual pretraining method, Head-Adaptive Rotary Position.
arXiv Detail & Related papers (2024-12-10T04:09:29Z)
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks. To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z)
Exploring Forgetting in Large Language Model Pre-Training [18.858330348834777]
Catastrophic forgetting remains a formidable obstacle to building an omniscient model in large language models (LLMs) We systematically explored the existence and measurement of forgetting in pre-training, questioning traditional metrics such as perplexity (PPL) and introducing new metrics to better detect entity memory retention.
arXiv Detail & Related papers (2024-10-22T13:39:47Z)
Zero-shot Model-based Reinforcement Learning using Large Language Models [12.930241182192988]
We investigate how pre-trained Large Language Models can be leveraged to predict in context the dynamics of continuous Markov decision processes. We present proof-of-concept applications in two reinforcement learning settings: model-based policy evaluation and data-augmented off-policy reinforcement learning.
arXiv Detail & Related papers (2024-10-15T15:46:53Z)
Continual Learning of Large Language Models: A Comprehensive Survey [18.546766135948154]
Large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. While extensively studied in the continual learning (CL) community, it presents new manifestations in the realm of LLMs.
arXiv Detail & Related papers (2024-04-25T17:38:57Z)
Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale. This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z)
A Survey of Confidence Estimation and Calibration in Large Language Models [86.692994151323]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks in various domains. Despite their impressive performance, they can be unreliable due to factual errors in their generations. Assessing their confidence and calibrating them across different tasks can help mitigate risks and enable LLMs to produce better generations.
arXiv Detail & Related papers (2023-11-14T16:43:29Z)
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety. Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs. We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning [70.48605869773814]
Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned information. This study empirically evaluates the forgetting phenomenon in large language models during continual instruction tuning.
arXiv Detail & Related papers (2023-08-17T02:53:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.