CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging
- URL: http://arxiv.org/abs/2505.06977v2
- Date: Wed, 14 May 2025 14:11:52 GMT
- Title: CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging
- Authors: Wenju Sun, Qingyong Li, Yangli-ao Geng, Boyang Li,
- Abstract summary: Multi-task model merging offers a promising paradigm for integrating multiple expert models into a unified model without additional training.<n>We propose Conflict-Aware Task Merging, a training-free framework that selectively trims conflict-prone components from the task vectors.<n>Experiments on vision, language, and vision-language tasks demonstrate that CAT Merging effectively suppresses knowledge conflicts, achieving average accuracy improvements of up to 2.5%.
- Score: 10.386229962375548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task model merging offers a promising paradigm for integrating multiple expert models into a unified model without additional training. Existing state-of-the-art techniques, such as Task Arithmetic and its variants, merge models by accumulating task vectors -- the parameter differences between pretrained and finetuned models. However, task vector accumulation is often hindered by knowledge conflicts, leading to performance degradation. To address this challenge, we propose Conflict-Aware Task Merging (CAT Merging), a novel training-free framework that selectively trims conflict-prone components from the task vectors. CAT Merging introduces several parameter-specific strategies, including projection for linear weights and masking for scaling and shifting parameters in normalization layers. Extensive experiments on vision, language, and vision-language tasks demonstrate that CAT Merging effectively suppresses knowledge conflicts, achieving average accuracy improvements of up to 2.5% (ViT-B/32) and 2.0% (ViT-L/14) over state-of-the-art methods.
Related papers
- Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration [16.667053306761364]
Multi-task model merging aims to consolidate knowledge from multiple task-specific experts into a unified model.<n>Existing methods approach this by minimizing differences between task-specific experts and the unified model.<n>We propose Layer-wise Optimal Task Vector Merging, a technique that explicitly minimizes feature drift between task-specific experts and the unified model.
arXiv Detail & Related papers (2025-05-29T08:11:31Z) - Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning [59.6658995479243]
We propose texttext-Perturb-and-Merge (P&M), a novel continual learning framework that integrates model merging into the CL paradigm to avoid forgetting.<n>Through theoretical analysis, we minimize the total loss increase across all tasks and derive an analytical solution for the optimal merging coefficient.<n>Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
arXiv Detail & Related papers (2025-05-28T14:14:19Z) - CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging [11.2457468104475]
Model merging based on task vectors provides an efficient way to integrate multiple task-specific models into a multitask model without retraining.<n>Recent works have endeavored to address the conflicts between task vectors, one of the significant challenges faced by model merging, through sparsification.<n>We propose CABS (Conflict-Aware and Balanced Sparsification), consisting of Conflict-Aware Sparsification (CA) and Balanced Sparsification (BS)
arXiv Detail & Related papers (2025-02-26T12:38:55Z) - Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts [13.356826891549856]
Multi-task model merging offers an efficient solution for integrating knowledge from multiple fine-tuned models.<n>Despite the promising performance of Task Arithmetic (TA), conflicts can arise among the task vectors.<n>We propose Task Arithmetic in Trust Region (TATR), which defines the trust region as dimensions in the model parameter space.
arXiv Detail & Related papers (2025-01-25T04:09:56Z) - Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.<n>We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.<n>Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - Multi-Task Model Merging via Adaptive Weight Disentanglement [69.7292615212444]
We introduce an Adaptive Weight Disentanglement method for model merging.<n>We successfully extract redundant vectors, and after their subtraction, the task vectors retain robust performance.
arXiv Detail & Related papers (2024-11-27T20:08:55Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.