ZipIt! Merging Models from Different Tasks without Training
- URL: http://arxiv.org/abs/2305.03053v3
- Date: Wed, 13 Mar 2024 02:04:06 GMT
- Title: ZipIt! Merging Models from Different Tasks without Training
- Authors: George Stoica, Daniel Bolya, Jakob Bjorner, Pratik Ramesh, Taylor
Hearn, Judy Hoffman
- Abstract summary: "ZipIt!" is a general method for merging two arbitrary models of the same architecture.
We find that these two changes combined account for 20-60% improvement over prior work.
- Score: 20.2479633507354
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Typical deep visual recognition models are capable of performing the one task
they were trained on. In this paper, we tackle the extremely difficult problem
of combining distinct models with different initializations, each solving a
separate task, into one multi-task model without any additional training. Prior
work in model merging permutes one model to the space of the other then
averages them together. While this works for models trained on the same task,
we find that this fails to account for the differences in models trained on
disjoint tasks. Thus, we introduce "ZipIt!", a general method for merging two
arbitrary models of the same architecture that incorporates two simple
strategies. First, in order to account for features that aren't shared between
models, we expand the model merging problem to allow for merging features
within each model by defining a general "zip" operation. Second, we add support
for partially zipping the models up until a specified layer, naturally creating
a multi-head model. We find that these two changes combined account for 20-60%
improvement over prior work, making it more feasible to merge models trained on
disjoint tasks without retraining.
Related papers
- What Matters for Model Merging at Scale? [94.26607564817786]
Model merging aims to combine multiple expert models into a more capable single model.
Previous studies have primarily focused on merging a few small models.
This study systematically evaluates the utility of model merging at scale.
arXiv Detail & Related papers (2024-10-04T17:17:19Z) - PLeaS -- Merging Models with Permutations and Least Squares [43.17620198572947]
We propose a new two-step algorithm to merge models-termed PLeaS.
PLeaS partially matches nodes in each layer by maximizing alignment.
It computes the weights of the merged model as a layer-wise Least Squares solution.
arXiv Detail & Related papers (2024-07-02T17:24:04Z) - Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging [21.918559935122786]
Model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training.
Traditional model merging methods often show significant performance gaps compared to fine-tuned models.
We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance.
We propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input.
arXiv Detail & Related papers (2024-06-17T02:31:55Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Merging by Matching Models in Task Parameter Subspaces [87.8712523378141]
Model merging aims to cheaply combine individual task-specific models into a single multitask model.
We formalize how this approach to model merging can be seen as solving a linear system of equations.
We show that using the conjugate gradient method can outperform closed-form solutions.
arXiv Detail & Related papers (2023-12-07T14:59:15Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - UnIVAL: Unified Model for Image, Video, Audio and Language Tasks [105.77733287326308]
UnIVAL model goes beyond two modalities and unifies text, images, video, and audio into a single model.
Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning.
Thanks to the unified model, we propose a novel study on multimodal model merging via weight generalization.
arXiv Detail & Related papers (2023-07-30T09:48:36Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - GAN Cocktail: mixing GANs without dataset access [18.664733153082146]
We tackle the problem of model merging, given two constraints that often come up in the real world.
In the first stage, we transform the weights of all the models to the same parameter space by a technique we term model rooting.
In the second stage, we merge the rooted models by averaging their weights and fine-tuning them for each specific domain, using only data generated by the original trained models.
arXiv Detail & Related papers (2021-06-07T17:59:04Z) - Lifelong Learning with Searchable Extension Units [21.17631355880764]
We propose a new lifelong learning framework named Searchable Extension Units (SEU)
It breaks down the need for a predefined original model and searches for specific extension units for different tasks.
Our approach can obtain a much more compact model without catastrophic forgetting.
arXiv Detail & Related papers (2020-03-19T03:45:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.