ZipIt! Merging Models from Different Tasks without Training
- URL: http://arxiv.org/abs/2305.03053v3
- Date: Wed, 13 Mar 2024 02:04:06 GMT
- Title: ZipIt! Merging Models from Different Tasks without Training
- Authors: George Stoica, Daniel Bolya, Jakob Bjorner, Pratik Ramesh, Taylor
Hearn, Judy Hoffman
- Abstract summary: "ZipIt!" is a general method for merging two arbitrary models of the same architecture.
We find that these two changes combined account for 20-60% improvement over prior work.
- Score: 20.2479633507354
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Typical deep visual recognition models are capable of performing the one task
they were trained on. In this paper, we tackle the extremely difficult problem
of combining distinct models with different initializations, each solving a
separate task, into one multi-task model without any additional training. Prior
work in model merging permutes one model to the space of the other then
averages them together. While this works for models trained on the same task,
we find that this fails to account for the differences in models trained on
disjoint tasks. Thus, we introduce "ZipIt!", a general method for merging two
arbitrary models of the same architecture that incorporates two simple
strategies. First, in order to account for features that aren't shared between
models, we expand the model merging problem to allow for merging features
within each model by defining a general "zip" operation. Second, we add support
for partially zipping the models up until a specified layer, naturally creating
a multi-head model. We find that these two changes combined account for 20-60%
improvement over prior work, making it more feasible to merge models trained on
disjoint tasks without retraining.
Related papers
- Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.
We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.
Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - How to Merge Your Multimodal Models Over Time? [73.11304741033761]
We propose a unified framework called TIME which defines temporal model merging across three axes.
We study temporal model merging across model sizes, compute budgets, and learning horizons on the FoMo-in-Flux benchmark.
arXiv Detail & Related papers (2024-12-09T18:01:13Z) - What Matters for Model Merging at Scale? [94.26607564817786]
Model merging aims to combine multiple expert models into a more capable single model.
Previous studies have primarily focused on merging a few small models.
This study systematically evaluates the utility of model merging at scale.
arXiv Detail & Related papers (2024-10-04T17:17:19Z) - PLeaS -- Merging Models with Permutations and Least Squares [43.17620198572947]
We propose a new two-step algorithm to merge models-termed PLeaS.
PLeaS partially matches nodes in each layer by maximizing alignment.
It computes the weights of the merged model as a layer-wise Least Squares solution.
arXiv Detail & Related papers (2024-07-02T17:24:04Z) - Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging [21.918559935122786]
Model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training.
Traditional model merging methods often show significant performance gaps compared to fine-tuned models.
We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance.
We propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input.
arXiv Detail & Related papers (2024-06-17T02:31:55Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - UnIVAL: Unified Model for Image, Video, Audio and Language Tasks [105.77733287326308]
UnIVAL model goes beyond two modalities and unifies text, images, video, and audio into a single model.
Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning.
Thanks to the unified model, we propose a novel study on multimodal model merging via weight generalization.
arXiv Detail & Related papers (2023-07-30T09:48:36Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Lifelong Learning with Searchable Extension Units [21.17631355880764]
We propose a new lifelong learning framework named Searchable Extension Units (SEU)
It breaks down the need for a predefined original model and searches for specific extension units for different tasks.
Our approach can obtain a much more compact model without catastrophic forgetting.
arXiv Detail & Related papers (2020-03-19T03:45:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.