Related papers: Toward a Holistic Approach to Continual Model Merging

Toward a Holistic Approach to Continual Model Merging

URL: http://arxiv.org/abs/2509.23592v1
Date: Sun, 28 Sep 2025 02:51:04 GMT
Title: Toward a Holistic Approach to Continual Model Merging
Authors: Hoang Phan, Sungmin Cha, Tung Lam Tran, Qi Lei,
Abstract summary: We present a holistic framework for continual model merging that intervenes at three critical stages: pre-merging, during merging, and post-merging-to address two fundamental challenges in continual learning.<n>Our method overcomes limitations by first fine-tuning the main model within its tangent space on domain-specific data.<n>During merging, we leverage functional information from available states beyond mere parameter averages to avoid the need to revisit old models.<n>Finally, a post-merging correction aligns the representation discrepancy between pre- and post-merged models, reducing bias and enhancing overall performance-all while operating under constant memory constraints without
Score: 24.769931209311498
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: We present a holistic framework for continual model merging that intervenes at three critical stages: pre-merging, during merging, and post-merging-to address two fundamental challenges in continual learning. In particular, conventional approaches either maintain a growing list of per-domain task vectors, leading to scalability issues or rely solely on weight-space merging when old data is inaccessible, thereby losing crucial functional information. Our method overcomes these limitations by first fine-tuning the main model within its tangent space on domain-specific data; this linearization amplifies per-task weight disentanglement, effectively mitigating across-task interference. During merging, we leverage functional information from available optimizer states beyond mere parameter averages to avoid the need to revisit old data. Finally, a post-merging correction aligns the representation discrepancy between pre- and post-merged models, reducing bias and enhancing overall performance-all while operating under constant memory constraints without accessing historical data. Extensive experiments on standard class-incremental and domain-incremental benchmarks demonstrate that our approach not only achieves competitive performance but also provides a scalable and efficient solution to the catastrophic forgetting problem.

Related papers

ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation [34.173549610331385]
Model merging aims to combine multiple task-specific expert models into a single model.<n>Interference among experts, especially when they are trained on different objectives, often leads to significant performance degradation.<n>acem is an Adaptive Covariance Estimation framework that effectively mitigates inter-task interference.
arXiv Detail & Related papers (2026-03-03T12:53:04Z)
Generative Data Transformation: From Mixed to Unified Data [57.84692191369066]
textscTaesar is a emphdata-centric framework for textbftarget-textbfal textbfregeneration.<n>It encodes cross-domain context into target sequences, enabling standard models to learn intricate dependencies without complex fusion architectures.
arXiv Detail & Related papers (2026-02-26T08:30:09Z)
OFMU: Optimization-Driven Framework for Machine Unlearning [5.100622189286672]
Large language models increasingly require the ability to unlearn specific knowledge, such as user requests, copyrighted materials, or outdated information.<n>We propose OFMU, a penalty-based bi-level optimization framework that explicitly prioritizes forgetting while preserving retention.<n>We show that OFMU consistently outperforms existing unlearning methods in both efficacy and retained utility.
arXiv Detail & Related papers (2025-09-26T15:31:32Z)
Heterogeneous Self-Supervised Acoustic Pre-Training with Local Constraints [64.15709757611369]
We propose a new self-supervised pre-training approach to dealing with heterogeneous data.<n>The proposed approach can significantly improve the adaptivity of the self-supervised pre-trained model for the downstream supervised fine-tuning tasks.
arXiv Detail & Related papers (2025-08-27T15:48:50Z)
Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors [27.848233831749216]
textbfWUDI-Merging (textbfWhoever started the interference shotextbfUld entextbfD textbfIt) is a model merging method that eliminates interference without any additional data or rescaling coefficients.<n> Comprehensive empirical evaluations across vision and language benchmarks demonstrate our method's superiority.
arXiv Detail & Related papers (2025-03-11T07:01:35Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning [57.43911113915546]
Few-Shot Class-Incremental Learning (FSCIL) introduces a paradigm in which the problem space expands with limited data. FSCIL methods inherently face the challenge of catastrophic forgetting as data arrives incrementally. We propose the OrCo framework built on two core principles: features' orthogonality in the representation space, and contrastive learning.
arXiv Detail & Related papers (2024-03-27T13:30:48Z)
Vanishing Feature: Diagnosing Model Merging and Beyond [1.1510009152620668]
We identify the vanishing feature'' phenomenon, where input-induced features diminish during propagation through a merged model.<n>We show that existing normalization strategies can be enhanced by precisely targeting the vanishing feature issue.<n>We propose the Preserve-First Merging'' (PFM) strategy, which focuses on preserving early-layer features.
arXiv Detail & Related papers (2024-02-05T17:06:26Z)
Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information. We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.