Localizing Task Information for Improved Model Merging and Compression
- URL: http://arxiv.org/abs/2405.07813v1
- Date: Mon, 13 May 2024 14:54:37 GMT
- Title: Localizing Task Information for Improved Model Merging and Compression
- Authors: Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez, François Fleuret, Pascal Frossard,
- Abstract summary: We show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights.
We propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches.
- Score: 61.16012721460561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model merging and task arithmetic have emerged as promising scalable approaches to merge multiple single-task checkpoints to one multi-task model, but their applicability is reduced by significant performance loss. Previous works have linked these drops to interference in the weight space and erasure of important task-specific features. Instead, in this work we show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights. We propose TALL-masks, a method to identify these task supports given a collection of task vectors and show that one can retrieve >99% of the single task accuracy by applying our masks to the multi-task vector, effectively compressing the individual checkpoints. We study the statistics of intersections among constructed masks and reveal the existence of selfish and catastrophic weights, i.e., parameters that are important exclusively to one task and irrelevant to all tasks but detrimental to multi-task fusion. For this reason, we propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches. Our experiments in vision and NLP benchmarks with up to 20 tasks, show that Consensus Merging consistently improves existing approaches. Furthermore, our proposed compression scheme reduces storage from 57Gb to 8.2Gb while retaining 99.7% of original performance.
Related papers
- Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts [13.356826891549856]
Multi-task model merging offers an efficient solution for integrating knowledge from multiple fine-tuned models.
Despite the promising performance of Task Arithmetic (TA), conflicts can arise among the task vectors.
We propose Task Arithmetic in Trust Region (TATR), which defines the trust region as dimensions in the model parameter space.
arXiv Detail & Related papers (2025-01-25T04:09:56Z) - Tint Your Models Task-wise for Improved Multi-task Model Merging [17.496018757317824]
We propose Model Tinting, a test-time approach that introduces a single task-specific layer for each task as trainable adjustments.
Our method jointly trains merging coefficients and task-specific layers, which effectively reduces task conflicts with minimal additional costs.
Our method achieves state-of-the-art performance across both computer vision and natural language processing tasks.
arXiv Detail & Related papers (2024-12-26T07:42:06Z) - Multi-Task Model Merging via Adaptive Weight Disentanglement [69.7292615212444]
We introduce an Adaptive Weight Disentanglement method for model merging.
We successfully extract redundant vectors, and after their subtraction, the task vectors retain robust performance.
arXiv Detail & Related papers (2024-11-27T20:08:55Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Exposing and Addressing Cross-Task Inconsistency in Unified
Vision-Language Models [80.23791222509644]
Inconsistent AI models are considered brittle and untrustworthy by human users.
We find that state-of-the-art vision-language models suffer from a surprisingly high degree of inconsistent behavior across tasks.
We propose a rank correlation-based auxiliary training objective, computed over large automatically created cross-task contrast sets.
arXiv Detail & Related papers (2023-03-28T16:57:12Z) - ImpressLearn: Continual Learning via Combined Task Impressions [0.0]
This work proposes a new method to sequentially train a deep neural network on multiple tasks without suffering catastrophic forgetting.
We show that simply learning a linear combination of a small number of task-specific masks on a randomly backbone network is sufficient to both retain accuracy on previously learned tasks, as well as achieve high accuracy on new tasks.
arXiv Detail & Related papers (2022-10-05T02:28:25Z) - DiSparse: Disentangled Sparsification for Multitask Model Compression [92.84435347164435]
DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.
Our experimental results demonstrate superior performance on various configurations and settings.
arXiv Detail & Related papers (2022-06-09T17:57:46Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.