Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
- URL: http://arxiv.org/abs/2509.23893v1
- Date: Sun, 28 Sep 2025 13:55:05 GMT
- Title: Dynamic Orthogonal Continual Fine-tuning for Mitigating Catastrophic Forgettings
- Authors: Zhixin Zhang, Zeming Wei, Meng Sun,
- Abstract summary: Catastrophic forgetting remains a critical challenge in continual learning for large language models.<n>We propose Dynamic Orthogonal Continual (DOC) fine-tuning, a novel approach that tracks the drift of functional directions and dynamically updates them during the fine-tuning process.
- Score: 13.325021114990241
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Catastrophic forgetting remains a critical challenge in continual learning for large language models (LLMs), where models struggle to retain performance on historical tasks when fine-tuning on new sequential data without access to past datasets. In this paper, we first reveal that the drift of functional directions during the fine-tuning process is a key reason why existing regularization-based methods fail in long-term LLM continual learning. To address this, we propose Dynamic Orthogonal Continual (DOC) fine-tuning, a novel approach that tracks the drift of these functional directions and dynamically updates them during the fine-tuning process. Furthermore, by adjusting the gradients of new task parameters to be orthogonal to the tracked historical function directions, our method mitigates interference between new and old tasks. Extensive experiments on various LLM continual learning benchmarks demonstrate that this approach outperforms prior methods, effectively reducing catastrophic forgetting and providing a robust tool for continuous LLM fine-tuning. Our code is available at https://github.com/meloxxxxxx/DOC.
Related papers
- Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning [51.07663354001582]
Deep neural networks suffer from catastrophic forgetting, where performance on previous tasks degrades after training on a new task.<n>We present a novel approach to address this challenge, focusing on the intersection of memory-based methods and regularization approaches.<n>We formulate a regularization strategy, termed Information Maximization (IM) regularizer, for memory-based continual learning methods.
arXiv Detail & Related papers (2025-12-01T15:56:00Z) - Multimodal Continual Instruction Tuning with Dynamic Gradient Guidance [41.58239719458457]
Multimodal continual instruction tuning enables large language models to sequentially adapt to new tasks while building upon previously acquired knowledge.<n>However, this continual learning paradigm faces the significant challenge of catastrophic forgetting, where learning new tasks leads to performance degradation on previous ones.<n>We introduce a novel insight into catastrophic forgetting by conceptualizing it as a problem of missing gradients from old tasks during new task learning.
arXiv Detail & Related papers (2025-11-19T06:29:15Z) - Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning [57.514786046966265]
We propose textbfPerturb-and-Merge (P&M), a novel continual learning framework that integrates model merging into the CL paradigm to mitigate forgetting.<n>Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
arXiv Detail & Related papers (2025-05-28T14:14:19Z) - Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning [19.27175827358111]
Continual learning in large language models (LLMs) is prone to catastrophic forgetting, where adapting to new tasks significantly degrades performance on previously learned ones.<n>We propose a novel continual full fine-tuning approach leveraging adaptive singular value decomposition (SVD)<n>We evaluate our approach extensively on standard continual learning benchmarks using both encoder-decoder (T5-Large) and decoder-only (LLaMA-2 7B) models.
arXiv Detail & Related papers (2025-04-09T17:59:42Z) - Temporal-Difference Variational Continual Learning [89.32940051152782]
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.<n>Our approach effectively mitigates Catastrophic Forgetting, outperforming strong Variational CL methods.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - An Effective Dynamic Gradient Calibration Method for Continual Learning [11.555822066922508]
Continual learning (CL) is a fundamental topic in machine learning, where the goal is to train a model with continuously incoming data and tasks.
Due to the memory limit, we cannot store all the historical data, and therefore confront the catastrophic forgetting'' problem.
We develop an effective algorithm to calibrate the gradient in each updating step of the model.
arXiv Detail & Related papers (2024-07-30T16:30:09Z) - Improving Data-aware and Parameter-aware Robustness for Continual Learning [3.480626767752489]
This paper analyzes that this insufficiency arises from the ineffective handling of outliers.
We propose a Robust Continual Learning (RCL) method to address this issue.
The proposed method effectively maintains robustness and achieves new state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2024-05-27T11:21:26Z) - Adaptive Retention & Correction: Test-Time Training for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.<n>We name our approach Adaptive Retention & Correction (ARC)<n>ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Elastic Multi-Gradient Descent for Parallel Continual Learning [28.749215705746135]
We study the novel paradigm of Parallel Continual Learning (PCL) in dynamic multi-task scenarios.
PCL presents challenges due to the training of an unspecified number of tasks with varying learning progress.
We propose a memory editing mechanism guided by the gradient computed using EMGD to balance the training between old and new tasks.
arXiv Detail & Related papers (2024-01-02T06:26:25Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Continuous Transition: Improving Sample Efficiency for Continuous
Control Problems via MixUp [119.69304125647785]
This paper introduces a concise yet powerful method to construct Continuous Transition.
Specifically, we propose to synthesize new transitions for training by linearly interpolating the consecutive transitions.
To keep the constructed transitions authentic, we also develop a discriminator to guide the construction process automatically.
arXiv Detail & Related papers (2020-11-30T01:20:23Z) - Meta Learning MPC using Finite-Dimensional Gaussian Process
Approximations [0.9539495585692008]
Two key factors that hinder the practical applicability of learning methods in control are their high computational complexity and limited generalization capabilities to unseen conditions.
This paper makes use of a meta-learning approach for adaptive model predictive control, by learning a system model that leverages data from previous related tasks.
arXiv Detail & Related papers (2020-08-13T15:59:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.