Recurrent Knowledge Identification and Fusion for Language Model Continual Learning
- URL: http://arxiv.org/abs/2502.17510v1
- Date: Sat, 22 Feb 2025 05:37:27 GMT
- Title: Recurrent Knowledge Identification and Fusion for Language Model Continual Learning
- Authors: Yujie Feng, Xujia Wang, Zexin Lu, Shenghong Fu, Guangyuan Shi, Yongxin Xu, Yasha Wang, Philip S. Yu, Xu Chu, Xiao-Ming Wu,
- Abstract summary: Recurrent-KIF is a CL framework for Recurrent Knowledge Identification and Fusion.<n>Inspired by human continual learning, Recurrent-KIF employs an inner loop that rapidly adapts to new tasks.<n> outer loop that globally manages the fusion of new and historical knowledge.
- Score: 41.901501650712234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual learning (CL) is crucial for deploying large language models (LLMs) in dynamic real-world environments without costly retraining. While recent model ensemble and model merging methods guided by parameter importance have gained popularity, they often struggle to balance knowledge transfer and forgetting, mainly due to the reliance on static importance estimates during sequential training. In this paper, we present Recurrent-KIF, a novel CL framework for Recurrent Knowledge Identification and Fusion, which enables dynamic estimation of parameter importance distributions to enhance knowledge transfer. Inspired by human continual learning, Recurrent-KIF employs an inner loop that rapidly adapts to new tasks while identifying important parameters, coupled with an outer loop that globally manages the fusion of new and historical knowledge through redundant knowledge pruning and key knowledge merging. These inner-outer loops iteratively perform multiple rounds of fusion, allowing Recurrent-KIF to leverage intermediate training information and adaptively adjust fusion strategies based on evolving importance distributions. Extensive experiments on two CL benchmarks with various model sizes (from 770M to 13B) demonstrate that Recurrent-KIF effectively mitigates catastrophic forgetting and enhances knowledge transfer.
Related papers
- Accurate Forgetting for Heterogeneous Federated Continual Learning [89.08735771893608]
We propose a new concept accurate forgetting (AF) and develop a novel generative-replay methodMethodwhich selectively utilizes previous knowledge in federated networks.<n>We employ a probabilistic framework based on a normalizing flow model to quantify the credibility of previous knowledge.
arXiv Detail & Related papers (2025-02-20T02:35:17Z) - Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network.
Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z) - Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - KIF: Knowledge Identification and Fusion for Language Model Continual Learning [41.28933724210434]
We introduce a novel framework for language models, named Knowledge Identification and Fusion (KIF)<n>KIF segregates the model into'skill units' based on parameter dependencies, allowing for more precise control.<n>It employs a novel group-wise knowledge identification technique to ascertain the importance distribution of skill units for a new task.<n>As a result, KIF achieves an optimal balance between retaining prior knowledge and excelling in new tasks.
arXiv Detail & Related papers (2024-08-09T17:44:45Z) - Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning [15.475427498268393]
The Train-Attention-Augmented Language Model (TAALM) enhances learning efficiency by dynamically predicting and applying weights to tokens based on their usefulness.<n>We show that TAALM proves the state-of-the-art performance upon the baselines, and also shows synergistic compatibility when integrated with previous CKL approaches.
arXiv Detail & Related papers (2024-07-24T01:04:34Z) - A Unified and General Framework for Continual Learning [58.72671755989431]
Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge.
Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques.
This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies.
arXiv Detail & Related papers (2024-03-20T02:21:44Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - Lifelong Person Re-Identification via Knowledge Refreshing and
Consolidation [35.43406281230279]
Key challenge for Lifelong person re-identification (LReID) is how to incrementally preserve old knowledge and gradually add new capabilities to the system.
Inspired by the biological process of human cognition where the somatosensory neocortex and the hippocampus work together in memory consolidation, we formulated a model called Knowledge Refreshing and Consolidation (KRC)
KRC achieves both positive forward and backward transfer. More specifically, a knowledge refreshing scheme is incorporated with the knowledge rehearsal mechanism to enable bi-directional knowledge transfer.
arXiv Detail & Related papers (2022-11-29T13:39:45Z) - Learning an evolved mixture model for task-free continual learning [11.540150938141034]
We address the Task-Free Continual Learning (TFCL) in which a model is trained on non-stationary data streams with no explicit task information.
We introduce two simple dropout mechanisms to selectively remove stored examples in order to avoid memory overload.
arXiv Detail & Related papers (2022-07-11T16:01:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.