Multi-Task Model Merging via Adaptive Weight Disentanglement
- URL: http://arxiv.org/abs/2411.18729v1
- Date: Wed, 27 Nov 2024 20:08:55 GMT
- Title: Multi-Task Model Merging via Adaptive Weight Disentanglement
- Authors: Feng Xiong, Runxi Cheng, Wang Chen, Zhanqiu Zhang, Yiwen Guo, Chun Yuan, Ruifeng Xu,
- Abstract summary: Model merging is a technique for integrating task-specific weights from various tasks into a unified multi-task model without retraining or additional data.<n> Task Arithmetic (TA) has demonstrated that combining task vectors through arithmetic operations facilitates efficient capability transfer between different tasks.<n>Despite the notable effectiveness of TA, interference among task vectors can adversely affect the performance of the merged model.<n>We propose Adaptive Weight Disentanglement (AWD), which decomposes traditional task vectors into a vector redundant and several disentangled task vectors.
- Score: 69.7292615212444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model merging has gained increasing attention as an efficient and effective technique for integrating task-specific weights from various tasks into a unified multi-task model without retraining or additional data. As a representative approach, Task Arithmetic (TA) has demonstrated that combining task vectors through arithmetic operations facilitates efficient capability transfer between different tasks. In this framework, task vectors are obtained by subtracting the parameter values of a pre-trained model from those of individually fine-tuned models initialized from it. Despite the notable effectiveness of TA, interference among task vectors can adversely affect the performance of the merged model. In this paper, we relax the constraints of Task Arithmetic Property and propose Task Consistency Property, which can be regarded as being free from task interference. Through theoretical derivation, we show that such a property can be approximately achieved by seeking orthogonal task vectors. Guiding by this insight, we propose Adaptive Weight Disentanglement (AWD), which decomposes traditional task vectors into a redundant vector and several disentangled task vectors. The primary optimization objective of AWD is to achieve orthogonality among the disentangled task vectors, thereby closely approximating the desired solution. Notably, these disentangled task vectors can be seamlessly integrated into existing merging methodologies. Experimental results demonstrate that our AWD consistently and significantly improves upon previous merging approaches, achieving state-of-the-art results. Our code is available at \href{https://github.com/FarisXiong/AWD.git}{https://github.com/FarisXiong/AWD.git}.
Related papers
- AdaRank: Adaptive Rank Pruning for Enhanced Model Merging [15.383220675351076]
Model merging has emerged as a promising approach for unifying independently fine-tuned models into an integrated framework.
We propose AdaRank, a novel model merging framework that adaptively selects the most beneficial singular directions of task vectors to merge multiple models.
AdaRank consistently achieves state-of-the-art performance with various backbones and number of tasks, reducing the performance gap between fine-tuned models to nearly 1%.
arXiv Detail & Related papers (2025-03-28T06:49:06Z) - Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts [13.356826891549856]
Multi-task model merging offers an efficient solution for integrating knowledge from multiple fine-tuned models.
Despite the promising performance of Task Arithmetic (TA), conflicts can arise among the task vectors.
We propose Task Arithmetic in Trust Region (TATR), which defines the trust region as dimensions in the model parameter space.
arXiv Detail & Related papers (2025-01-25T04:09:56Z) - Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.
We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.
Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - Revisiting Weight Averaging for Model Merging [16.503826062785773]
Model merging aims to build a multi-task learner by combining the parameters of individually fine-tuned models without additional training.
Weight averaging implicitly induces task vectors centered around the weight averaging itself.
Applying a low-rank approximation to these centered task vectors significantly improves merging performance.
arXiv Detail & Related papers (2024-12-11T06:29:20Z) - ATM: Improving Model Merging by Alternating Tuning and Merging [16.12778778313037]
We motivate the effectiveness of task vectors by linking them to multi-task gradients.
In a single-epoch scenario, task vectors are mathematically equivalent to the gradients obtained via gradient descent in a multi-task setting.
We show that task vectors perform optimally when equality is maintained, and their effectiveness is largely driven by the first epoch's gradient.
arXiv Detail & Related papers (2024-11-05T12:42:42Z) - Localizing Task Information for Improved Model Merging and Compression [61.16012721460561]
We show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights.
We propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches.
arXiv Detail & Related papers (2024-05-13T14:54:37Z) - Parameter Efficient Multi-task Model Fusion with Partial Linearization [97.23530944186078]
We propose a novel method to improve multi-task fusion for parameter-efficient fine-tuning techniques.
Our approach partially linearizes only the adapter modules and applies task arithmetic over the linearized adapters.
We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model.
arXiv Detail & Related papers (2023-10-07T08:55:54Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Editing Models with Task Arithmetic [69.97273155842966]
Changing how pre-trained models behave is a common practice when developing machine learning systems.
We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task.
We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition.
arXiv Detail & Related papers (2022-12-08T05:50:53Z) - Parameter-Efficient Transfer Learning with Diff Pruning [108.03864629388404]
diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework.
We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
arXiv Detail & Related papers (2020-12-14T12:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.