Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA
- URL: http://arxiv.org/abs/2412.01004v5
- Date: Mon, 26 May 2025 05:41:05 GMT
- Title: Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA
- Authors: Haodong Lu, Chongyang Zhao, Jason Xue, Lina Yao, Kristen Moore, Dong Gong,
- Abstract summary: Pre-trained vision-language embedding models such as CLIP have been widely adopted and validated in Continual Learning (CL)<n>Existing CL methods primarily focus on continual downstream adaptation using components isolated from the pre-trained model (PTM)<n>We propose a universal and efficient CL approach for CLIP based on Dynamic Rank-Selective LoRA (CoDyRA)
- Score: 19.982853959240497
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning (CL) aims to accumulate knowledge from sequential data and task streams. Leveraging their strong generalization and flexibility, pre-trained vision-language embedding models such as CLIP (Contrastive Language-Image Pre-training) have been widely adopted and validated in CL. In addition to learning new knowledge, we investigate whether the pre-trained knowledge in CLIP, can be retained or even enhanced, in CL, while incorporating new knowledge from a data stream. Existing CL methods primarily focus on continual downstream adaptation using components isolated from the pre-trained model (PTM), increasing inference complexity and limiting improvements to the PTM itself; some also retain knowledge by relying on additional reference data, resulting in high training costs. To address these limitations, we propose a universal and efficient CL approach for CLIP based on Dynamic Rank-Selective LoRA (CoDyRA), which directly improves the PTMs while preserving the existing knowledge from both pre-training and CL. By analyzing how LoRA rank and placement affect learning and forgetting in CL, we design CoDyRA that adaptively performs rank-minimized parameter updates in different modules, based on their importance to the current data. This ensures a balance between knowledge acquisition (plasticity) and forgetting mitigation (stability). Our method operates without explicit domain or distribution prediction and does not rely on reference data, enabling seamless task integration while maintaining pre-trained capabilities. Moreover, CoDyRA preserves the original model architecture and deployment pipeline, introducing no additional inference overhead. Extensive experiments show that our approach enhances representations for new downstream data while retaining pre-trained knowledge, achieving state-of-the-art results.
Related papers
- CLA: Latent Alignment for Online Continual Self-Supervised Learning [53.52783900926569]
We introduce Continual Latent Alignment (CLA), a novel SSL strategy for Online CL.<n>Our CLA is able to speed up the convergence of the training process in the online scenario, outperforming state-of-the-art approaches under the same computational budget.<n>We also discovered that using CLA as a pretraining protocol in the early stages of pretraining leads to a better final performance when compared to a full i.i.d. pretraining.
arXiv Detail & Related papers (2025-07-14T16:23:39Z) - Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning [11.50324946279326]
Contrastive Language-Image Pre-trained model (CLIP) exhibiting strong capabilities across various downstream tasks.<n>We analyze the variations in the modality gap during the fine-tuning of vision-language pre-trained models.<n>We propose a simple yet effective method, MG-CLIP, that improves CLIP's performance in class-incremental learning.
arXiv Detail & Related papers (2025-07-12T02:28:42Z) - Diffusion Guidance Is a Controllable Policy Improvement Operator [98.11511661904618]
CFGRL is trained with the simplicity of supervised learning, yet can further improve on the policies in the data.<n>On offline RL tasks, we observe a reliable trend -- increased guidance weighting leads to increased performance.
arXiv Detail & Related papers (2025-05-29T14:06:50Z) - Parameter Efficient Continual Learning with Dynamic Low-Rank Adaptation [19.48677836920734]
Catastrophic forgetting has remained a critical challenge for deep neural networks in Continual Learning (CL)<n>We introduce PEARL, a rehearsal-free CL framework that entails dynamic rank allocation for LoRA components during CL training.
arXiv Detail & Related papers (2025-05-17T13:19:01Z) - Enhancing knowledge retention for continual learning with domain-specific adapters and features gating [4.637185817866919]
Continual learning empowers models to learn from a continuous stream of data while preserving previously acquired knowledge.
We propose a new approach that integrates adapters within the self-attention mechanisms of Vision Transformers to enhance knowledge retention when sequentially adding datasets from different domains.
arXiv Detail & Related papers (2025-04-11T15:20:08Z) - SPARC: Subspace-Aware Prompt Adaptation for Robust Continual Learning in LLMs [4.194295877935867]
We propose a lightweight continual learning framework for large language models (LLMs)
Our method achieves high knowledge retention in both task-incremental and domain-incremental continual learning setups.
Experiments on the SuperGLUE benchmark demonstrate that our PCA-based prompt tuning combined with LoRA maintains full knowledge retention while improving accuracy, utilizing only 1% of the model's parameters.
arXiv Detail & Related papers (2025-02-05T06:11:55Z) - Aligning Instruction Tuning with Pre-training [81.4748965653345]
We propose Aligning Instruction Tuning with Pre-training (AITP) to align instruction tuning with pre-training distributions.
We show consistent performance improvements with AITP on three fully open large language models (LLMs) across eight benchmarks.
arXiv Detail & Related papers (2025-01-16T08:27:40Z) - Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning [22.13331870720021]
We propose a beyond prompt learning approach to the RFCL task, called Continual Adapter (C-ADA)
C-ADA flexibly extends specific weights in CAL to learn new knowledge for each task and freezes old weights to preserve prior knowledge.
Our approach achieves significantly improved performance and training speed, outperforming the current state-of-the-art (SOTA) method.
arXiv Detail & Related papers (2024-07-14T17:40:40Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights [67.72413262980272]
Severe data imbalance naturally exists among web-scale vision-language datasets.
We find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning.
The robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts.
arXiv Detail & Related papers (2024-05-31T17:57:24Z) - Investigating Continual Pretraining in Large Language Models: Insights and Implications [9.660013084324817]
Continual learning in large language models (LLMs) is an evolving domain that focuses on developing efficient and sustainable training strategies.
We introduce a new benchmark designed to measure the adaptability of LLMs to changing pretraining data landscapes.
Our findings uncover several key insights: (i) continual pretraining consistently improves 1.5B models studied in this work and is also superior to domain adaptation, (ii) larger models always achieve better perplexity than smaller ones when continually pretrained on the same corpus, (iii) smaller models are particularly sensitive to continual pretraining, showing the most significant rates of both learning and
arXiv Detail & Related papers (2024-02-27T10:47:24Z) - Continual Learners are Incremental Model Generalizers [70.34479702177988]
This paper extensively studies the impact of Continual Learning (CL) models as pre-trainers.
We find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance.
We propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks.
arXiv Detail & Related papers (2023-06-21T05:26:28Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Mitigating Forgetting in Online Continual Learning via Contrasting
Semantically Distinct Augmentations [22.289830907729705]
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one.
Main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones.
arXiv Detail & Related papers (2022-11-10T05:29:43Z) - Unified Instance and Knowledge Alignment Pretraining for Aspect-based
Sentiment Analysis [96.53859361560505]
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect.
There always exists severe domain shift between the pretraining and downstream ABSA datasets.
We introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline.
arXiv Detail & Related papers (2021-10-26T04:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.