Task Singular Vectors: Reducing Task Interference in Model Merging
- URL: http://arxiv.org/abs/2412.00081v3
- Date: Fri, 04 Apr 2025 10:10:41 GMT
- Title: Task Singular Vectors: Reducing Task Interference in Model Merging
- Authors: Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli, Simone Scardapane, Fabrizio Silvestri, Emanuele RodolĂ ,
- Abstract summary: Task Arithmetic has emerged as a simple yet effective method to merge models without additional training.<n>We study task vectors at the layer level, focusing on task layer matrices and their singular value decomposition.<n>We introduce TSV-Merge (TSV-M), a novel model merging approach that combines compression with interference reduction.
- Score: 19.4876941464776
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Task Arithmetic has emerged as a simple yet effective method to merge models without additional training. However, by treating entire networks as flat parameter vectors, it overlooks key structural information and is susceptible to task interference. In this paper, we study task vectors at the layer level, focusing on task layer matrices and their singular value decomposition. In particular, we concentrate on the resulting singular vectors, which we refer to as Task Singular Vectors (TSV). Recognizing that layer task matrices are often low-rank, we propose TSV-Compress (TSV-C), a simple procedure that compresses them to 10% of their original size while retaining 99% of accuracy. We further leverage this low-rank space to define a new measure of task interference based on the interaction of singular vectors from different tasks. Building on these findings, we introduce TSV-Merge (TSV-M), a novel model merging approach that combines compression with interference reduction, significantly outperforming existing methods.
Related papers
- Efficient Model Editing with Task-Localized Sparse Fine-tuning [14.792099973449794]
We propose TaLoS which allows to build sparse task vectors with minimal interference without requiring explicit linearization.
We find that pre-trained models contain a subset of parameters with consistently low gradient sensitivity across tasks.
Our experiments prove that TaLoS improves training and inference efficiency while outperforming current methods in task addition and negation.
arXiv Detail & Related papers (2025-04-03T14:20:06Z) - Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts [13.356826891549856]
Multi-task model merging offers an efficient solution for integrating knowledge from multiple fine-tuned models.
Despite the promising performance of Task Arithmetic (TA), conflicts can arise among the task vectors.
We propose Task Arithmetic in Trust Region (TATR), which defines the trust region as dimensions in the model parameter space.
arXiv Detail & Related papers (2025-01-25T04:09:56Z) - Multi-Task Model Merging via Adaptive Weight Disentanglement [69.7292615212444]
Model merging is a technique for integrating task-specific weights from various tasks into a unified multi-task model without retraining or additional data.<n> Task Arithmetic (TA) has demonstrated that combining task vectors through arithmetic operations facilitates efficient capability transfer between different tasks.<n>Despite the notable effectiveness of TA, interference among task vectors can adversely affect the performance of the merged model.<n>We propose Adaptive Weight Disentanglement (AWD), which decomposes traditional task vectors into a vector redundant and several disentangled task vectors.
arXiv Detail & Related papers (2024-11-27T20:08:55Z) - ATM: Improving Model Merging by Alternating Tuning and Merging [16.12778778313037]
We motivate the effectiveness of task vectors by linking them to multi-task gradients.
In a single-epoch scenario, task vectors are mathematically equivalent to the gradients obtained via gradient descent in a multi-task setting.
We show that task vectors perform optimally when equality is maintained, and their effectiveness is largely driven by the first epoch's gradient.
arXiv Detail & Related papers (2024-11-05T12:42:42Z) - Knowledge Composition using Task Vectors with Learned Anisotropic Scaling [51.4661186662329]
We introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level.
We show that such linear combinations explicitly exploit the low intrinsicity of pre-trained models, with only a few coefficients being the learnable parameters.
We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives.
arXiv Detail & Related papers (2024-07-03T07:54:08Z) - Localizing Task Information for Improved Model Merging and Compression [61.16012721460561]
We show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights.
We propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches.
arXiv Detail & Related papers (2024-05-13T14:54:37Z) - Task Indicating Transformer for Task-conditional Dense Predictions [16.92067246179703]
We introduce a novel task-conditional framework called Task Indicating Transformer (TIT) to tackle this challenge.
Our approach designs a Mix Task Adapter module within the transformer block, which incorporates a Task Indicating Matrix through matrix decomposition.
We also propose a Task Gate Decoder module that harnesses a Task Indicating Vector and gating mechanism to facilitate adaptive multi-scale feature refinement.
arXiv Detail & Related papers (2024-03-01T07:06:57Z) - Low-Rank Multitask Learning based on Tensorized SVMs and LSSVMs [65.42104819071444]
Multitask learning (MTL) leverages task-relatedness to enhance performance.
We employ high-order tensors, with each mode corresponding to a task index, to naturally represent tasks referenced by multiple indices.
We propose a general framework of low-rank MTL methods with tensorized support vector machines (SVMs) and least square support vector machines (LSSVMs)
arXiv Detail & Related papers (2023-08-30T14:28:26Z) - Mitigating Task Interference in Multi-Task Learning via Explicit Task
Routing with Non-Learnable Primitives [19.90788777476128]
Multi-task learning (MTL) seeks to learn a single model to accomplish multiple tasks by leveraging shared information among the tasks.
Existing MTL models have been known to suffer from negative interference among tasks.
We propose ETR-NLP to mitigate task interference through a synergistic combination of non-learnable primitives and explicit task routing.
arXiv Detail & Related papers (2023-08-03T22:34:16Z) - ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - Sample Efficient Linear Meta-Learning by Alternating Minimization [74.40553081646995]
We study a simple alternating minimization method (MLLAM) which alternately learns the low-dimensional subspace and the regressors.
We show that for a constant subspace dimension MLLAM obtains nearly-optimal estimation error, despite requiring only $Omega(log d)$ samples per task.
We propose a novel task subset selection scheme that ensures the same strong statistical guarantee as MLLAM.
arXiv Detail & Related papers (2021-05-18T06:46:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.