FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning
- URL: http://arxiv.org/abs/2601.03938v1
- Date: Wed, 07 Jan 2026 13:55:14 GMT
- Title: FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning
- Authors: Yujie Feng, Hao Wang, Jian Li, Xu Chu, Zhaolu Kang, Yiran Liu, Yasha Wang, Philip S. Yu, Xiao-Ming Wu,
- Abstract summary: FOREVER (FORgEtting curVe-inspired mEmory) is a novel framework that aligns replay schedules with a model-centric notion of time.<n>Building on this approach, FOREVER incorporates a forgetting curve-based replay scheduler to determine when to replay and an intensity-aware regularization mechanism to adaptively control how to replay.
- Score: 63.20028888397869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning (CL) for large language models (LLMs) aims to enable sequential knowledge acquisition without catastrophic forgetting. Memory replay methods are widely used for their practicality and effectiveness, but most rely on fixed, step-based heuristics that often misalign with the model's actual learning progress, since identical training steps can result in varying degrees of parameter change. Motivated by recent findings that LLM forgetting mirrors the Ebbinghaus human forgetting curve, we propose FOREVER (FORgEtting curVe-inspired mEmory Replay), a novel CL framework that aligns replay schedules with a model-centric notion of time. FOREVER defines model time using the magnitude of optimizer updates, allowing forgetting curve-inspired replay intervals to align with the model's internal evolution rather than raw training steps. Building on this approach, FOREVER incorporates a forgetting curve-based replay scheduler to determine when to replay and an intensity-aware regularization mechanism to adaptively control how to replay. Extensive experiments on three CL benchmarks and models ranging from 0.6B to 13B parameters demonstrate that FOREVER consistently mitigates catastrophic forgetting.
Related papers
- MERGETUNE: Continued fine-tuning of vision-language models [77.8627788911249]
Fine-tuning vision-language models (VLMs) often leads to catastrophic forgetting of pretrained knowledge.<n>We introduce a novel paradigm, continued fine-tuning (CFT), which seeks to recover pretrained knowledge after a zero-shot model has already been adapted.
arXiv Detail & Related papers (2026-01-15T15:15:53Z) - End-to-End Training for Autoregressive Video Diffusion via Self-Resampling [63.84672807009907]
Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch.<n>We introduce Resampling Forcing, a teacher-free framework that enables training autoregressive video models from scratch and at scale.
arXiv Detail & Related papers (2025-12-17T18:53:29Z) - Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning [17.898277374771254]
We introduce Recover-to-Forget (R2F), a novel framework for efficient unlearning in large foundation models.<n>R2F reconstructs full-model gradient directions from low-rank LoRA adapter updates.<n>We show that R2F offers a scalable and lightweight alternative for unlearning in pretrained LLMs without requiring full retraining or access to internal parameters.
arXiv Detail & Related papers (2025-12-08T10:10:12Z) - STABLE: Gated Continual Learning for Large Language Models [0.0]
STABLE is a gated continual self editing framework that constrains forgetting during sequential updates.<n>Each candidate edit is evaluated against a stability budget using one of three metrics.<n>Experiments on the Qwen-2.5-7B model show that gating effectively mitigates forgetting while preserving adaptability.
arXiv Detail & Related papers (2025-10-17T16:14:05Z) - Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models [19.136589266017694]
Training large language models typically involves pre-training on massive corpora.<n>New data often causes distribution shifts, leading to performance degradation on previously learned tasks.<n>We take a deeper look at two popular proposals for addressing this distribution shift: experience replay and gradient alignment.
arXiv Detail & Related papers (2025-08-03T20:07:15Z) - TS-ACL: Closed-Form Solution for Time Series-oriented Continual Learning [16.270548433574465]
Time series class-incremental learning faces two major challenges: catastrophic forgetting and intra-class variations.<n>We propose TS-ACL, which leverages a gradient-free closed-form solution to avoid the catastrophic forgetting problem.<n>It also provides privacy protection and efficiency.
arXiv Detail & Related papers (2024-10-21T12:34:02Z) - Temporal-Difference Variational Continual Learning [77.92320830700797]
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.<n>Our approach effectively mitigates Catastrophic Forgetting, outperforming strong Variational CL methods.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive
Learning [67.07363529640784]
We propose OpenSTL to categorize prevalent approaches into recurrent-based and recurrent-free models.
We conduct standard evaluations on datasets across various domains, including synthetic moving object trajectory, human motion, driving scenes, traffic flow and forecasting weather.
We find that recurrent-free models achieve a good balance between efficiency and performance than recurrent models.
arXiv Detail & Related papers (2023-06-20T03:02:14Z) - On the Costs and Benefits of Adopting Lifelong Learning for Software
Analytics -- Empirical Study on Brown Build and Risk Prediction [17.502553991799832]
This paper evaluates the use of lifelong learning (LL) for industrial use cases at Ubisoft.
LL is used to continuously build and maintain ML-based software analytics tools using an incremental learner that progressively updates the old model using new data.
arXiv Detail & Related papers (2023-05-16T21:57:16Z) - Sequential Learning Of Neural Networks for Prequential MDL [18.475866691786695]
We evaluate approaches for computing prequential description lengths for image classification datasets with neural networks.
Considering the computational cost, we find that online-learning with rehearsal has favorable performance.
We present description lengths for a suite of image classification datasets that improve upon previously reported results by large margins.
arXiv Detail & Related papers (2022-10-14T16:30:23Z) - Remembering for the Right Reasons: Explanations Reduce Catastrophic
Forgetting [100.75479161884935]
We propose a novel training paradigm called Remembering for the Right Reasons (RRR)
RRR stores visual model explanations for each example in the buffer and ensures the model has "the right reasons" for its predictions.
We demonstrate how RRR can be easily added to any memory or regularization-based approach and results in reduced forgetting.
arXiv Detail & Related papers (2020-10-04T10:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.