Continual Learning in Vision-Language Models via Aligned Model Merging
- URL: http://arxiv.org/abs/2506.03189v1
- Date: Fri, 30 May 2025 20:52:21 GMT
- Title: Continual Learning in Vision-Language Models via Aligned Model Merging
- Authors: Ghada Sokar, Gintare Karolina Dziugaite, Anurag Arnab, Ahmet Iscen, Pablo Samuel Castro, Cordelia Schmid,
- Abstract summary: We present a new perspective based on model merging to maintain stability while still retaining plasticity.<n>To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones.
- Score: 84.47520899851557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors plasticity over the stability needed to retain prior knowledge. While existing approaches attempt to mitigate catastrophic forgetting, a bias towards recent tasks persists as they build upon this sequential nature. In this work we present a new perspective based on model merging to maintain stability while still retaining plasticity. Rather than just sequentially updating the model weights, we propose merging newly trained task parameters with previously learned ones, promoting a better balance. To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones, thereby avoiding interference when merging. We evaluate this approach on large Vision-Language Models (VLMs), and demonstrate its effectiveness in reducing forgetting, increasing robustness to various task orders and similarities, and improving generalization.
Related papers
- Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning [59.6658995479243]
We propose texttext-Perturb-and-Merge (P&M), a novel continual learning framework that integrates model merging into the CL paradigm to avoid forgetting.<n>Through theoretical analysis, we minimize the total loss increase across all tasks and derive an analytical solution for the optimal merging coefficient.<n>Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
arXiv Detail & Related papers (2025-05-28T14:14:19Z) - BECAME: BayEsian Continual Learning with Adaptive Model MErging [21.642774366793997]
We introduce a two-stage framework named BECAME, which synergizes the expertise of gradient projection and adaptive merging.<n>Our approach outperforms state-of-the-art CL methods and existing merging strategies.
arXiv Detail & Related papers (2025-04-03T15:07:28Z) - Neural Networks Remember More: The Power of Parameter Isolation and Combination [3.2430260063115233]
Catastrophic forgetting is a pervasive issue for pre-trained language models.<n>Key to solving this problem is to find a trade-off between the plasticity and stability of the model.<n>We propose a novel method to achieve a balance between model stability and plasticity.
arXiv Detail & Related papers (2025-02-16T02:58:57Z) - Spurious Forgetting in Continual Learning of Language Models [20.0936011355535]
Recent advancements in large language models (LLMs) reveal a perplexing phenomenon in continual learning.<n>Despite extensive training, models experience significant performance declines.<n>This study proposes that such performance drops often reflect a decline in task alignment rather than true knowledge loss.
arXiv Detail & Related papers (2025-01-23T08:09:54Z) - Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - Temporal-Difference Variational Continual Learning [89.32940051152782]
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.<n>Our approach effectively mitigates Catastrophic Forgetting, outperforming strong Variational CL methods.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks
in Continual Learning [23.15206507040553]
We propose Auxiliary Network Continual Learning (ANCL) to equip the neural network with the ability to learn the current task.
ANCL applies an additional auxiliary network which promotes plasticity to the continually learned model which mainly focuses on stability.
More concretely, the proposed framework materializes in a regularizer that naturally interpolates between plasticity and stability.
arXiv Detail & Related papers (2023-03-16T17:00:42Z) - Towards Better Plasticity-Stability Trade-off in Incremental Learning: A
simple Linear Connector [8.13916229438606]
Plasticity-stability dilemma is a main problem for incremental learning.
We show that a simple averaging of two independently optimized optima of networks, null-space projection for past tasks and simple SGD for the current task, can attain a meaningful balance between preserving already learned knowledge and granting sufficient flexibility for learning a new task.
arXiv Detail & Related papers (2021-10-15T07:37:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.