DUKAE: DUal-level Knowledge Accumulation and Ensemble for Pre-Trained Model-Based Continual Learning
- URL: http://arxiv.org/abs/2504.06521v2
- Date: Mon, 14 Apr 2025 13:22:13 GMT
- Title: DUKAE: DUal-level Knowledge Accumulation and Ensemble for Pre-Trained Model-Based Continual Learning
- Authors: Songze Li, Tonghua Su, Xu-Yao Zhang, Qixing Xu, Zhongjie Wang,
- Abstract summary: Pre-trained model-based continual learning (PTMCL) has garnered growing attention, as it enables more rapid acquisition of new knowledge.<n>We propose a method named DUal-level Knowledge Accumulation and Ensemble (DUKAE) that leverages both feature-level and decision-level knowledge accumulation.<n>Experiments on CIFAR-100, ImageNet-R, CUB-200, and Cars-196 datasets demonstrate the superior performance of our approach.
- Score: 19.684132921720945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained model-based continual learning (PTMCL) has garnered growing attention, as it enables more rapid acquisition of new knowledge by leveraging the extensive foundational understanding inherent in pre-trained model (PTM). Most existing PTMCL methods use Parameter-Efficient Fine-Tuning (PEFT) to learn new knowledge while consolidating existing memory. However, they often face some challenges. A major challenge lies in the misalignment of classification heads, as the classification head of each task is trained within a distinct feature space, leading to inconsistent decision boundaries across tasks and, consequently, increased forgetting. Another critical limitation stems from the restricted feature-level knowledge accumulation, with feature learning typically restricted to the initial task only, which constrains the model's representation capabilities. To address these issues, we propose a method named DUal-level Knowledge Accumulation and Ensemble (DUKAE) that leverages both feature-level and decision-level knowledge accumulation by aligning classification heads into a unified feature space through Gaussian distribution sampling and introducing an adaptive expertise ensemble to fuse knowledge across feature subspaces. Extensive experiments on CIFAR-100, ImageNet-R, CUB-200, and Cars-196 datasets demonstrate the superior performance of our approach.
Related papers
- Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning [80.31842748505895]
Few-shot class-incremental learning (FSCIL) involves learning new classes from limited data while retaining prior knowledge.<n>We propose Continuous Knowledge-Preserving Decomposition for FSCIL (CKPD-FSCIL), a framework that decomposes a model's weights into two parts.<n> Experiments on multiple benchmarks show that CKPD-FSCIL outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-01-09T07:18:48Z) - Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA [19.982853959240497]
We investigate whether pre-trained knowledge in vision-language models (VLMs) can be retained -- or even enhanced -- in continual learning (CL)<n>We propose a universal and efficient Continual Learning approach for VLM based on Dynamic Rank-Selective LoRA (CoDyRA)
arXiv Detail & Related papers (2024-12-01T23:41:42Z) - A Self-Constructing Multi-Expert Fuzzy System for High-dimensional Data Classification [33.926721742862156]
Fuzzy Neural Networks (FNNs) are effective machine learning models for classification tasks.
We propose a novel fuzzy system, the Self-Constructing Multi-Expert Fuzzy System (SOME-FS)
It combines two learning strategies: mixed structure learning and multi-expert advanced learning.
arXiv Detail & Related papers (2024-10-17T09:41:54Z) - Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase.
In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training.
We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z) - SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training [68.7896349660824]
We present an in-depth analysis of the progressive overfitting problem from the lens of Seq FT.
Considering that the overly fast representation learning and the biased classification layer constitute this particular problem, we introduce the advanced Slow Learner with Alignment (S++) framework.
Our approach involves a Slow Learner to selectively reduce the learning rate of backbone parameters, and a Alignment to align the disjoint classification layers in a post-hoc fashion.
arXiv Detail & Related papers (2024-08-15T17:50:07Z) - KIF: Knowledge Identification and Fusion for Language Model Continual Learning [41.28933724210434]
We introduce a novel framework for language models, named Knowledge Identification and Fusion (KIF)
KIF segregates the model into'skill units' based on parameter dependencies, allowing for more precise control.
It employs a novel group-wise knowledge identification technique to ascertain the importance distribution of skill units for a new task.
As a result, KIF achieves an optimal balance between retaining prior knowledge and excelling in new tasks.
arXiv Detail & Related papers (2024-08-09T17:44:45Z) - Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning [22.13331870720021]
We propose a beyond prompt learning approach to the RFCL task, called Continual Adapter (C-ADA)
C-ADA flexibly extends specific weights in CAL to learn new knowledge for each task and freezes old weights to preserve prior knowledge.
Our approach achieves significantly improved performance and training speed, outperforming the current state-of-the-art (SOTA) method.
arXiv Detail & Related papers (2024-07-14T17:40:40Z) - Open Continual Feature Selection via Granular-Ball Knowledge Transfer [16.48797678104989]
We propose a novel framework for continual feature selection (CFS) in data preprocessing.
The proposed CFS method combines the strengths of continual learning (CL) with granular-ball computing (GBC)
We show that our method is superior in terms of both effectiveness and efficiency compared to state-of-the-art feature selection methods.
arXiv Detail & Related papers (2024-03-15T12:43:03Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Overcoming Generic Knowledge Loss with Selective Parameter Update [48.240683797965005]
We propose a novel approach to continuously update foundation models.
Instead of updating all parameters equally, we localize the updates to a sparse set of parameters relevant to the task being learned.
Our method achieves improvements on the accuracy of the newly learned tasks up to 7% while preserving the pretraining knowledge with a negligible decrease of 0.9% on a representative control set accuracy.
arXiv Detail & Related papers (2023-08-23T22:55:45Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - SLCA: Slow Learner with Classifier Alignment for Continual Learning on a
Pre-trained Model [73.80068155830708]
We present an extensive analysis for continual learning on a pre-trained model (CLPM)
We propose a simple but extremely effective approach named Slow Learner with Alignment (SLCA)
Across a variety of scenarios, our proposal provides substantial improvements for CLPM.
arXiv Detail & Related papers (2023-03-09T08:57:01Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.