STABLE: Gated Continual Learning for Large Language Models
- URL: http://arxiv.org/abs/2510.16089v1
- Date: Fri, 17 Oct 2025 16:14:05 GMT
- Title: STABLE: Gated Continual Learning for Large Language Models
- Authors: William Hoy, Nurcin Celik,
- Abstract summary: STABLE is a gated continual self editing framework that constrains forgetting during sequential updates.<n>Each candidate edit is evaluated against a stability budget using one of three metrics.<n>Experiments on the Qwen-2.5-7B model show that gating effectively mitigates forgetting while preserving adaptability.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) increasingly require mechanisms for continual adaptation without full retraining. However, sequential updates can lead to catastrophic forgetting, where new edits degrade previously acquired knowledge. This work presents STABLE, a gated continual self editing framework that constrains forgetting during sequential updates using parameter efficient fine tuning via Low Rank Adaptation (LoRA; see arXiv:2106.09685). Each candidate edit is evaluated against a stability budget using one of three metrics: (i) Exact Match (EM) drop, capturing factual accuracy loss; (ii) bits increase, reflecting reduced model confidence; and (iii) KL divergence, quantifying distributional drift between the base and adapted models. If a threshold is exceeded, the LoRA update is rescaled through a clipping procedure or rejected. Experiments on the Qwen-2.5-7B model show that gating effectively mitigates forgetting while preserving adaptability. EM based gating achieved the highest cumulative performance in short continual learning sequences. Our results show that different gating strategies can achieve comparable distribution shift (measured by KL divergence) while producing different accuracy outcomes, highlighting the importance of gating design in continual adaptation. This approach offers a principled method for continual model editing, enabling LLMs to integrate new knowledge while maintaining reliability. Code: https://github.com/Bhoy1/STABLE
Related papers
- Rethinking Test-Time Training: Tilting The Latent Distribution For Few-Shot Source-Free Adaptation [3.5808917363708743]
We study test-time adaptation of foundation models for few-shot classification under a completely frozen-model regime.<n>We propose arguably the first training-free inference method that adapts predictions to the new task by performing a change of measure over the latent embedding distribution induced by the encoder.
arXiv Detail & Related papers (2026-02-02T18:17:29Z) - FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning [63.20028888397869]
FOREVER (FORgEtting curVe-inspired mEmory) is a novel framework that aligns replay schedules with a model-centric notion of time.<n>Building on this approach, FOREVER incorporates a forgetting curve-based replay scheduler to determine when to replay and an intensity-aware regularization mechanism to adaptively control how to replay.
arXiv Detail & Related papers (2026-01-07T13:55:14Z) - Grokked Models are Better Unlearners [5.8757712547216485]
Starting from grokked checkpoints consistently yields more efficient forgetting.<n>Post-grokking models learn more modular representations with reduced gradient alignment between forget and retain subsets.
arXiv Detail & Related papers (2025-12-03T04:35:49Z) - Understanding Robustness of Model Editing in Code LLMs: An Empirical Study [1.5624785508022727]
We present a systematic study of five state-of-the-art model editing methods.<n>We apply these methods to three leading open-source code LLMs, CodeLlama, CodeQwen1.5, and DeepSeek-Coder.<n>Instant edits consistently degrade model performance, with syntactic validity dropping by up to 86 percentage points and functional correctness declining by 45 points even in the best-performing setting.
arXiv Detail & Related papers (2025-11-05T04:58:13Z) - Sig2Model: A Boosting-Driven Model for Updatable Learned Indexes [6.133666849556217]
Sig2Model is an efficient and adaptive learned index that minimizes retraining cost through three key techniques.<n>We show that Sig2Model reduces retraining cost by up to 20x, achieves up to 3x higher QPS, and uses up to 1000x less memory.
arXiv Detail & Related papers (2025-09-25T06:07:13Z) - Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks [17.067788440109137]
Mixture-of-Experts (MoE) models are now standard in state-of-the-art systems.<n>We investigate how MoE sparsity influences two distinct capability regimes: memorization skills and reasoning skills.
arXiv Detail & Related papers (2025-08-26T04:31:28Z) - Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning [57.514786046966265]
We propose textbfPerturb-and-Merge (P&M), a novel continual learning framework that integrates model merging into the CL paradigm to mitigate forgetting.<n>Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
arXiv Detail & Related papers (2025-05-28T14:14:19Z) - Beyond Freezing: Sparse Tuning Enhances Plasticity in Continual Learning with Pre-Trained Models [10.904981532789824]
Continual Learning with Pre-trained Models holds great promise for efficient adaptation across sequential tasks.<n>Existing approaches freeze PTMs and rely on auxiliary modules like prompts or adapters.<n>We propose Mutual Information-guided Sparse Tuning (MIST), a plug-and-play method that selectively updates a small subset of PTM parameters.
arXiv Detail & Related papers (2025-05-26T13:09:25Z) - UniErase: Towards Balanced and Precise Unlearning in Language Models [69.04923022755547]
Large language models (LLMs) require iterative updates to address the outdated information problem.<n>UniErase is a novel unlearning framework that demonstrates precision and balanced performances between knowledge unlearning and ability retaining.
arXiv Detail & Related papers (2025-05-21T15:53:28Z) - Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning [19.27175827358111]
Continual learning in large language models (LLMs) is prone to catastrophic forgetting, where adapting to new tasks significantly degrades performance on previously learned ones.<n>We propose a novel continual full fine-tuning approach leveraging adaptive singular value decomposition (SVD)<n>We evaluate our approach extensively on standard continual learning benchmarks using both encoder-decoder (T5-Large) and decoder-only (LLaMA-2 7B) models.
arXiv Detail & Related papers (2025-04-09T17:59:42Z) - ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA [55.697627106315004]
Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors.<n>Previous approaches manage sequential edits by freezing original parameters and discretely allocating new parameters for each knowledge update.<n>We propose ELDER, a novel approach to create a continuous association between data and adapters.
arXiv Detail & Related papers (2024-08-19T02:27:00Z) - Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.