KIF: Knowledge Identification and Fusion for Language Model Continual Learning
- URL: http://arxiv.org/abs/2408.05200v4
- Date: Thu, 23 Jan 2025 12:06:37 GMT
- Title: KIF: Knowledge Identification and Fusion for Language Model Continual Learning
- Authors: Yujie Feng, Xu Chu, Yongxin Xu, Zexin Lu, Bo Liu, Philip S. Yu, Xiao-Ming Wu,
- Abstract summary: We introduce a novel framework for language models, named Knowledge Identification and Fusion (KIF)
KIF segregates the model into'skill units' based on parameter dependencies, allowing for more precise control.
It employs a novel group-wise knowledge identification technique to ascertain the importance distribution of skill units for a new task.
As a result, KIF achieves an optimal balance between retaining prior knowledge and excelling in new tasks.
- Score: 41.28933724210434
- License:
- Abstract: Language model continual learning (CL) has recently attracted significant interest for its ability to adapt large language models (LLMs) to dynamic real-world scenarios without retraining. A major challenge in this domain is catastrophic forgetting, where models lose previously acquired knowledge upon learning new tasks. Existing approaches commonly utilize multiple parameter-efficient fine-tuning (PEFT) blocks to acquire task-specific knowledge, yet these methods are inefficient and fail to leverage potential knowledge transfer across tasks. In this paper, we introduce a novel CL framework for language models, named Knowledge Identification and Fusion (KIF), which boosts knowledge transfer without depending on memory replay. KIF initially segregates the model into 'skill units' based on parameter dependencies, allowing for more precise control. Subsequently, it employs a novel group-wise knowledge identification technique to ascertain the importance distribution of skill units for a new task. By comparing this importance distribution with those from previous tasks, we implement a fine-grained knowledge fusion strategy that retains task-specific knowledge, thereby preventing forgetting, and updates task-shared knowledge, which facilitates bi-directional knowledge transfer. As a result, KIF achieves an optimal balance between retaining prior knowledge and excelling in new tasks. KIF also demonstrates strong generalizability, making it suitable for various base models and adaptable to PEFT methods like LoRA. Furthermore, it offers notable extensibility, supporting enhancements through integration with memory replay techniques. Comprehensive experiments conducted on two CL benchmarks, involving models ranging from 220M to 7B parameters, affirm the effectiveness of KIF and its variants across different settings.
Related papers
- DATA: Decomposed Attention-based Task Adaptation for Rehearsal-Free Continual Learning [22.386864304549285]
Continual learning (CL) is essential for Large Language Models (LLMs) to adapt to evolving real-world demands.
Recent rehearsal-free methods employ model-based and regularization-based strategies to address this issue.
We propose a $textbfD$e $textbfA$ttention-based $textbfTask $textbfA$daptation ( DATA)
DATA explicitly decouples and learns both task-specific and task-shared knowledge using high-rank and low-rank task adapters.
arXiv Detail & Related papers (2025-02-17T06:35:42Z) - KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models [11.07333593086842]
Knowledge-aware Singular-value Adaptation (KaSA)
We introduce Knowledge-aware Singular-value Adaptation (KaSA), a PEFT method that leverages singular value decomposition (SVD) with knowledge-aware singular values to dynamically activate knowledge based on its relevance to the task at hand.
Experimental results demonstrate that KaSA consistently outperforms FFT and 14 popular PEFT baselines across 16 benchmarks and 4 synthetic datasets.
arXiv Detail & Related papers (2024-12-08T21:26:22Z) - Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models [31.209774088374374]
This paper introduces TRELM, a Robust and Efficient Pre-training framework for Knowledge-Enhanced Language Models.
We employ a robust approach to inject knowledge triples and employ a knowledge-augmented memory bank to capture valuable information.
We show that TRELM reduces pre-training time by at least 50% and outperforms other KEPLMs in knowledge probing tasks and multiple knowledge-aware language understanding tasks.
arXiv Detail & Related papers (2024-03-17T13:04:35Z) - SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models [71.78800549517298]
Continual learning (CL) ability is vital for deploying large language models (LLMs) in the dynamic world.
Existing methods devise the learning module to acquire task-specific knowledge with parameter-efficient tuning (PET) block and the selection module to pick out the corresponding one for the testing input.
We propose a novel Shared Attention Framework (SAPT) to align the PET learning and selection via the Shared Attentive Learning & Selection module.
arXiv Detail & Related papers (2024-01-16T11:45:03Z) - Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective [106.92016199403042]
We empirically investigate knowledge transfer from larger to smaller models through a parametric perspective.
We employ sensitivity-based techniques to extract and align knowledge-specific parameters between different large language models.
Our findings highlight the critical factors contributing to the process of parametric knowledge transfer.
arXiv Detail & Related papers (2023-10-17T17:58:34Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Learning an evolved mixture model for task-free continual learning [11.540150938141034]
We address the Task-Free Continual Learning (TFCL) in which a model is trained on non-stationary data streams with no explicit task information.
We introduce two simple dropout mechanisms to selectively remove stored examples in order to avoid memory overload.
arXiv Detail & Related papers (2022-07-11T16:01:27Z) - A Framework of Meta Functional Learning for Regularising Knowledge
Transfer [89.74127682599898]
This work proposes a novel framework of Meta Functional Learning (MFL) by meta-learning a generalisable functional model from data-rich tasks.
The MFL computes meta-knowledge on functional regularisation generalisable to different learning tasks by which functional training on limited labelled data promotes more discriminative functions to be learned.
arXiv Detail & Related papers (2022-03-28T15:24:09Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.