TIES-Merging: Resolving Interference When Merging Models
- URL: http://arxiv.org/abs/2306.01708v2
- Date: Fri, 27 Oct 2023 01:09:31 GMT
- Title: TIES-Merging: Resolving Interference When Merging Models
- Authors: Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, Mohit Bansal
- Abstract summary: Transfer learning can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency.
Model merging has emerged as a solution to combine multiple task-specific models into a single model without performing additional training.
Existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models.
We propose TIES-Merging, which introduces three novel steps when merging models: resetting parameters that only changed a small amount during fine-tuning, resolving sign conflicts, and merging only the parameters that are in alignment with the final agreed-upon sign.
- Score: 95.59265307318752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning - i.e., further fine-tuning a pre-trained model on a
downstream task - can confer significant advantages, including improved
downstream performance, faster convergence, and better sample efficiency. These
advantages have led to a proliferation of task-specific fine-tuned models,
which typically can only perform a single task and do not benefit from one
another. Recently, model merging techniques have emerged as a solution to
combine multiple task-specific models into a single multitask model without
performing additional training. However, existing merging methods often ignore
the interference between parameters of different models, resulting in large
performance drops when merging multiple models. In this paper, we demonstrate
that prior merging techniques inadvertently lose valuable information due to
two major sources of interference: (a) interference due to redundant parameter
values and (b) disagreement on the sign of a given parameter's values across
models. To address this, we propose our method, TRIM, ELECT SIGN & MERGE
(TIES-Merging), which introduces three novel steps when merging models: (1)
resetting parameters that only changed a small amount during fine-tuning, (2)
resolving sign conflicts, and (3) merging only the parameters that are in
alignment with the final agreed-upon sign. We find that TIES-Merging
outperforms several existing methods in diverse settings covering a range of
modalities, domains, number of tasks, model sizes, architectures, and
fine-tuning settings. We further analyze the impact of different types of
interference on model parameters, and highlight the importance of resolving
sign interference. Our code is available at
https://github.com/prateeky2806/ties-merging
Related papers
- Parameter Competition Balancing for Model Merging [13.66727853299506]
PCB-Merging is a training-free technique that adjusts the coefficients of each parameter for effective model merging.
PCB-Merging achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models.
arXiv Detail & Related papers (2024-10-03T11:17:58Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Activated Parameter Locating via Causal Intervention for Model Merging [26.98015572633289]
Model merging combines multiple models into one model, achieving convincing generalization without the necessity of additional training.
Existing models have demonstrated that dropping a portion of delta parameters can alleviate conflicts while maintaining performance.
We propose an Activated Locating (APL) method that utilizes causal intervention to estimate importance, enabling more precise parameter drops and better conflict mitigation.
arXiv Detail & Related papers (2024-08-18T14:00:00Z) - Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging [21.918559935122786]
Model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training.
Traditional model merging methods often show significant performance gaps compared to fine-tuned models.
We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance.
We propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input.
arXiv Detail & Related papers (2024-06-17T02:31:55Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - DPPA: Pruning Method for Large Language Model to Model Merging [39.13317231533299]
We introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA) to tackle the challenge of merging complex fine-tuned models.
We show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies.
Our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging.
arXiv Detail & Related papers (2024-03-05T09:12:49Z) - Merging by Matching Models in Task Parameter Subspaces [87.8712523378141]
Model merging aims to cheaply combine individual task-specific models into a single multitask model.
We formalize how this approach to model merging can be seen as solving a linear system of equations.
We show that using the conjugate gradient method can outperform closed-form solutions.
arXiv Detail & Related papers (2023-12-07T14:59:15Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Parameter-efficient Tuning of Large-scale Multimodal Foundation Model [68.24510810095802]
We propose A graceful prompt framework for cross-modal transfer (Aurora) to overcome these challenges.
Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal prompt tuning.
A thorough evaluation on six cross-modal benchmarks shows that it not only outperforms the state-of-the-art but even outperforms the full fine-tuning approach.
arXiv Detail & Related papers (2023-05-15T06:40:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.