Related papers: CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging

CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging

URL: http://arxiv.org/abs/2505.06977v2
Date: Wed, 14 May 2025 14:11:52 GMT
Title: CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging
Authors: Wenju Sun, Qingyong Li, Yangli-ao Geng, Boyang Li,
Abstract summary: Multi-task model merging offers a promising paradigm for integrating multiple expert models into a unified model without additional training.<n>We propose Conflict-Aware Task Merging, a training-free framework that selectively trims conflict-prone components from the task vectors.<n>Experiments on vision, language, and vision-language tasks demonstrate that CAT Merging effectively suppresses knowledge conflicts, achieving average accuracy improvements of up to 2.5%.
Score: 10.386229962375548
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-task model merging offers a promising paradigm for integrating multiple expert models into a unified model without additional training. Existing state-of-the-art techniques, such as Task Arithmetic and its variants, merge models by accumulating task vectors -- the parameter differences between pretrained and finetuned models. However, task vector accumulation is often hindered by knowledge conflicts, leading to performance degradation. To address this challenge, we propose Conflict-Aware Task Merging (CAT Merging), a novel training-free framework that selectively trims conflict-prone components from the task vectors. CAT Merging introduces several parameter-specific strategies, including projection for linear weights and masking for scaling and shifting parameters in normalization layers. Extensive experiments on vision, language, and vision-language tasks demonstrate that CAT Merging effectively suppresses knowledge conflicts, achieving average accuracy improvements of up to 2.5% (ViT-B/32) and 2.0% (ViT-L/14) over state-of-the-art methods.

Related papers

Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration [16.667053306761364]
Multi-task model merging aims to consolidate knowledge from multiple task-specific experts into a unified model.<n>Existing methods approach this by minimizing differences between task-specific experts and the unified model.<n>We propose Layer-wise Optimal Task Vector Merging, a technique that explicitly minimizes feature drift between task-specific experts and the unified model.
arXiv Detail & Related papers (2025-05-29T08:11:31Z)
Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning [59.6658995479243]
We propose texttext-Perturb-and-Merge (P&M), a novel continual learning framework that integrates model merging into the CL paradigm to avoid forgetting.<n>Through theoretical analysis, we minimize the total loss increase across all tasks and derive an analytical solution for the optimal merging coefficient.<n>Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
arXiv Detail & Related papers (2025-05-28T14:14:19Z)
CABS: Conflict-Aware and Balanced Sparsification for Enhancing Model Merging [11.2457468104475]
Model merging based on task vectors provides an efficient way to integrate multiple task-specific models into a multitask model without retraining.<n>Recent works have endeavored to address the conflicts between task vectors, one of the significant challenges faced by model merging, through sparsification.<n>We propose CABS (Conflict-Aware and Balanced Sparsification), consisting of Conflict-Aware Sparsification (CA) and Balanced Sparsification (BS)
arXiv Detail & Related papers (2025-02-26T12:38:55Z)
Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts [13.356826891549856]
Multi-task model merging offers an efficient solution for integrating knowledge from multiple fine-tuned models.<n>Despite the promising performance of Task Arithmetic (TA), conflicts can arise among the task vectors.<n>We propose Task Arithmetic in Trust Region (TATR), which defines the trust region as dimensions in the model parameter space.
arXiv Detail & Related papers (2025-01-25T04:09:56Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.<n>We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.<n>Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z)
Multi-Task Model Merging via Adaptive Weight Disentanglement [69.7292615212444]
We introduce an Adaptive Weight Disentanglement method for model merging.<n>We successfully extract redundant vectors, and after their subtraction, the task vectors retain robust performance.
arXiv Detail & Related papers (2024-11-27T20:08:55Z)
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks. We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.