Related papers: ZipIt! Merging Models from Different Tasks without Training

ZipIt! Merging Models from Different Tasks without Training

URL: http://arxiv.org/abs/2305.03053v3
Date: Wed, 13 Mar 2024 02:04:06 GMT
Title: ZipIt! Merging Models from Different Tasks without Training
Authors: George Stoica, Daniel Bolya, Jakob Bjorner, Pratik Ramesh, Taylor Hearn, Judy Hoffman
Abstract summary: "ZipIt!" is a general method for merging two arbitrary models of the same architecture. We find that these two changes combined account for 20-60% improvement over prior work.
Score: 20.2479633507354
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Typical deep visual recognition models are capable of performing the one task they were trained on. In this paper, we tackle the extremely difficult problem of combining distinct models with different initializations, each solving a separate task, into one multi-task model without any additional training. Prior work in model merging permutes one model to the space of the other then averages them together. While this works for models trained on the same task, we find that this fails to account for the differences in models trained on disjoint tasks. Thus, we introduce "ZipIt!", a general method for merging two arbitrary models of the same architecture that incorporates two simple strategies. First, in order to account for features that aren't shared between models, we expand the model merging problem to allow for merging features within each model by defining a general "zip" operation. Second, we add support for partially zipping the models up until a specified layer, naturally creating a multi-head model. We find that these two changes combined account for 20-60% improvement over prior work, making it more feasible to merge models trained on disjoint tasks without retraining.

Related papers

Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data. We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z)
Why Train Everything? Tint a Single Layer for Multi-task Model Merging [17.496018757317824]
Model merging integrates independently fine-tuned models into a single multi-task model, offering a flexible alternative to joint training. Many existing model merging methods introduce additional task-specific components, increasing complexity and requiring extra modifications. We propose Model Tinting, a lightweight yet highly effective approach that improves model merging by updating just a single layer.
arXiv Detail & Related papers (2024-12-26T07:42:06Z)
How to Merge Your Multimodal Models Over Time? [73.11304741033761]
We propose a unified framework called TIME which defines temporal model merging across three axes. We study temporal model merging across model sizes, compute budgets, and learning horizons on the FoMo-in-Flux benchmark.
arXiv Detail & Related papers (2024-12-09T18:01:13Z)
What Matters for Model Merging at Scale? [94.26607564817786]
Model merging aims to combine multiple expert models into a more capable single model. Previous studies have primarily focused on merging a few small models. This study systematically evaluates the utility of model merging at scale.
arXiv Detail & Related papers (2024-10-04T17:17:19Z)
PLeaS -- Merging Models with Permutations and Least Squares [43.17620198572947]
We propose a new two-step algorithm to merge models-termed PLeaS. PLeaS partially matches nodes in each layer by maximizing alignment. It computes the weights of the merged model as a layer-wise Least Squares solution.
arXiv Detail & Related papers (2024-07-02T17:24:04Z)
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging [21.918559935122786]
Model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. Traditional model merging methods often show significant performance gaps compared to fine-tuned models. We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance. We propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input.
arXiv Detail & Related papers (2024-06-17T02:31:55Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Merging by Matching Models in Task Parameter Subspaces [87.8712523378141]
Model merging aims to cheaply combine individual task-specific models into a single multitask model. We formalize how this approach to model merging can be seen as solving a linear system of equations. We show that using the conjugate gradient method can outperform closed-form solutions.
arXiv Detail & Related papers (2023-12-07T14:59:15Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks [105.77733287326308]
UnIVAL model goes beyond two modalities and unifies text, images, video, and audio into a single model. Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning. Thanks to the unified model, we propose a novel study on multimodal model merging via weight generalization.
arXiv Detail & Related papers (2023-07-30T09:48:36Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
GAN Cocktail: mixing GANs without dataset access [18.664733153082146]
We tackle the problem of model merging, given two constraints that often come up in the real world. In the first stage, we transform the weights of all the models to the same parameter space by a technique we term model rooting. In the second stage, we merge the rooted models by averaging their weights and fine-tuning them for each specific domain, using only data generated by the original trained models.
arXiv Detail & Related papers (2021-06-07T17:59:04Z)
Lifelong Learning with Searchable Extension Units [21.17631355880764]
We propose a new lifelong learning framework named Searchable Extension Units (SEU) It breaks down the need for a predefined original model and searches for specific extension units for different tasks. Our approach can obtain a much more compact model without catastrophic forgetting.
arXiv Detail & Related papers (2020-03-19T03:45:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.