Related papers: Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

URL: http://arxiv.org/abs/2406.15479v2
Date: Mon, 14 Oct 2024 04:14:26 GMT
Title: Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
Authors: Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng,
Abstract summary: Model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. Traditional model merging methods often show significant performance gaps compared to fine-tuned models. We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance. We propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input.
Score: 21.918559935122786
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the era of large language models, model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. However, two challenges remain: (a) interference between different models and (b) heterogeneous data during testing. Traditional model merging methods often show significant performance gaps compared to fine-tuned models due to these issues. Additionally, a one-size-fits-all model lacks flexibility for diverse test data, leading to performance degradation. We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance. In view of this, we propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input. This approach narrows the performance gap between merged and fine-tuned models and improves adaptability to heterogeneous data. Extensive experiments on $20$ datasets for both language and vision tasks demonstrate the effectiveness of our method, showing an average improvement of $28.34\%$ in absolute normalized score for discriminative tasks and even surpassing the fine-tuned upper bound on the generative tasks. Our implementation is available in \url{https://github.com/LZY-the-boys/Twin-Merging}

Related papers

SE-Merging: A Self-Enhanced Approach for Dynamic Model Merging [60.83635006372403]
textttSE-Merging is a self-enhanced model merging framework.<n>We show that textttSE-Merging achieves dynamic model merging without additional training.
arXiv Detail & Related papers (2025-06-22T18:38:41Z)
Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging [38.12136955174922]
Fine-tuning large language models (LMs) for individual tasks yields strong performance but is expensive for deployment and storage.<n>Recent works explore model merging to combine multiple task-specific models into a single multi-task model without additional training.<n>Existing merging methods often fail for models fine-tuned with low-rank adaptation (LoRA), due to significant performance degradation.
arXiv Detail & Related papers (2025-05-28T23:28:12Z)
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces [17.69597528370121]
Model merging integrates the weights of multiple task-specific models into a single multi-task model. Despite recent interest in the problem, a significant performance gap between the combined and single-task models remains. We show that alignment between singular components of task-specific and merged matrices strongly correlates with performance improvement.
arXiv Detail & Related papers (2025-02-07T14:22:56Z)
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data. We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z)
Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging [33.23758947497205]
Advanced embedding models are typically developed using large-scale multi-task data and joint training across multiple tasks. To overcome these challenges, we explore model merging-a technique that combines independently trained models to mitigate gradient conflicts and balance data distribution. We introduce a novel method, Self Positioning, which efficiently searches for optimal model combinations within the space of task vectors using gradient descent.
arXiv Detail & Related papers (2024-10-19T08:39:21Z)
What Matters for Model Merging at Scale? [94.26607564817786]
Model merging aims to combine multiple expert models into a more capable single model. Previous studies have primarily focused on merging a few small models. This study systematically evaluates the utility of model merging at scale.
arXiv Detail & Related papers (2024-10-04T17:17:19Z)
Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks [3.776249047528669]
We leverage the abundance of freely trained models to introduce a cost-free approach to model merging. It aims to maintain the distinctiveness of the task-specific final layers while unifying the initial layers. This approach ensures parameter consistency across all layers, essential for boosting performance.
arXiv Detail & Related papers (2024-09-24T07:19:30Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently. Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable. We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z)
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks. We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
TIES-Merging: Resolving Interference When Merging Models [95.59265307318752]
Transfer learning can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. Model merging has emerged as a solution to combine multiple task-specific models into a single model without performing additional training. Existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models. We propose TIES-Merging, which introduces three novel steps when merging models: resetting parameters that only changed a small amount during fine-tuning, resolving sign conflicts, and merging only the parameters that are in alignment with the final agreed-upon sign.
arXiv Detail & Related papers (2023-06-02T17:31:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.