TIES-Merging: Resolving Interference When Merging Models
- URL: http://arxiv.org/abs/2306.01708v2
- Date: Fri, 27 Oct 2023 01:09:31 GMT
- Title: TIES-Merging: Resolving Interference When Merging Models
- Authors: Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, Mohit Bansal
- Abstract summary: Transfer learning can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency.
Model merging has emerged as a solution to combine multiple task-specific models into a single model without performing additional training.
Existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models.
We propose TIES-Merging, which introduces three novel steps when merging models: resetting parameters that only changed a small amount during fine-tuning, resolving sign conflicts, and merging only the parameters that are in alignment with the final agreed-upon sign.
- Score: 95.59265307318752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning - i.e., further fine-tuning a pre-trained model on a
downstream task - can confer significant advantages, including improved
downstream performance, faster convergence, and better sample efficiency. These
advantages have led to a proliferation of task-specific fine-tuned models,
which typically can only perform a single task and do not benefit from one
another. Recently, model merging techniques have emerged as a solution to
combine multiple task-specific models into a single multitask model without
performing additional training. However, existing merging methods often ignore
the interference between parameters of different models, resulting in large
performance drops when merging multiple models. In this paper, we demonstrate
that prior merging techniques inadvertently lose valuable information due to
two major sources of interference: (a) interference due to redundant parameter
values and (b) disagreement on the sign of a given parameter's values across
models. To address this, we propose our method, TRIM, ELECT SIGN & MERGE
(TIES-Merging), which introduces three novel steps when merging models: (1)
resetting parameters that only changed a small amount during fine-tuning, (2)
resolving sign conflicts, and (3) merging only the parameters that are in
alignment with the final agreed-upon sign. We find that TIES-Merging
outperforms several existing methods in diverse settings covering a range of
modalities, domains, number of tasks, model sizes, architectures, and
fine-tuning settings. We further analyze the impact of different types of
interference on model parameters, and highlight the importance of resolving
sign interference. Our code is available at
https://github.com/prateeky2806/ties-merging
Related papers
- It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization [16.54335356612006]
The goal of model merging is to combine multiple models, each excelling in different tasks, into a single model that outperforms any of the individual source models.
Existing methods rely heavily on human intuition and customized strategies.
We propose the MM-MO method, which automates the search for optimal merging configurations using multi-objective optimization algorithms.
arXiv Detail & Related papers (2024-06-29T16:34:23Z) - Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging [21.918559935122786]
Model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training.
Traditional model merging methods often show significant performance gaps compared to fine-tuned models.
We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance.
We propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input.
arXiv Detail & Related papers (2024-06-17T02:31:55Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - DPPA: Pruning Method for Large Language Model to Model Merging [39.13317231533299]
We introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA) to tackle the challenge of merging complex fine-tuned models.
We show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies.
Our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging.
arXiv Detail & Related papers (2024-03-05T09:12:49Z) - Merging by Matching Models in Task Parameter Subspaces [87.8712523378141]
Model merging aims to cheaply combine individual task-specific models into a single multitask model.
We formalize how this approach to model merging can be seen as solving a linear system of equations.
We show that using the conjugate gradient method can outperform closed-form solutions.
arXiv Detail & Related papers (2023-12-07T14:59:15Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Understanding Parameter Sharing in Transformers [53.75988363281843]
Previous work on Transformers has focused on sharing parameters in different layers, which can improve the performance of models with limited parameters by increasing model depth.
We show that the success of this approach can be largely attributed to better convergence, with only a small part due to the increased model complexity.
Experiments on 8 machine translation tasks show that our model achieves competitive performance with only half the model complexity of parameter sharing models.
arXiv Detail & Related papers (2023-06-15T10:48:59Z) - Parameter-efficient Tuning of Large-scale Multimodal Foundation Model [68.24510810095802]
We propose A graceful prompt framework for cross-modal transfer (Aurora) to overcome these challenges.
Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal prompt tuning.
A thorough evaluation on six cross-modal benchmarks shows that it not only outperforms the state-of-the-art but even outperforms the full fine-tuning approach.
arXiv Detail & Related papers (2023-05-15T06:40:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.