Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
- URL: http://arxiv.org/abs/2502.15434v1
- Date: Fri, 21 Feb 2025 13:01:26 GMT
- Title: Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
- Authors: Yue Zhou, Yi Chang, Yuan Wu,
- Abstract summary: We propose Mixup Model Merge, an innovative approach inspired by the Mixup data augmentation technique.<n>M$3$ is a simple yet effective model merging method that significantly enhances the performance of the merged model.
- Score: 15.47711837051754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model merging integrates the parameters of multiple models into a unified model, combining their diverse capabilities. Existing model merging methods are often constrained by fixed parameter merging ratios. In this study, we propose Mixup Model Merge (M$^3$), an innovative approach inspired by the Mixup data augmentation technique. This method merges the parameters of two large language models (LLMs) by randomly generating linear interpolation ratios, allowing for a more flexible and comprehensive exploration of the parameter space. Extensive experiments demonstrate the superiority of our proposed M$^3$ method in merging fine-tuned LLMs: (1) it significantly improves performance across multiple tasks, (2) it enhances LLMs' out-of-distribution (OOD) robustness and adversarial robustness, (3) it achieves superior results when combined with sparsification techniques such as DARE, and (4) it offers a simple yet efficient solution that does not require additional computational resources. In conclusion, M$^3$ is a simple yet effective model merging method that significantly enhances the performance of the merged model by randomly generating contribution ratios for two fine-tuned LLMs. The code is available at https://github.com/MLGroupJLU/MixupModelMerge.
Related papers
- Dynamic Fisher-weighted Model Merging via Bayesian Optimization [37.02810891820468]
Existing merging approaches typically involve scaling the parameters model-wise or integrating parameter importance parameter-wise.
We unify these strategies into a more general merging framework, and introduce Dynamic Fisher-weighted Merging (DF-Merge)
We show that DF-Merge outperforms strong baselines across models of different sizes and a variety of tasks.
arXiv Detail & Related papers (2025-04-26T18:31:14Z) - AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization [86.8133939108057]
We propose AdaMMS, a novel model merging method tailored for heterogeneous MLLMs.
Our method tackles the challenges in three steps: mapping, merging and searching.
As the first model merging method capable of merging heterogeneous MLLMs without labeled data, AdaMMS outperforms previous model merging methods on various vision-language benchmarks.
arXiv Detail & Related papers (2025-03-31T05:13:02Z) - Model Assembly Learning with Heterogeneous Layer Weight Merging [57.8462476398611]
We introduce Model Assembly Learning (MAL), a novel paradigm for model merging.
MAL integrates parameters from diverse models in an open-ended model zoo to enhance the base model's capabilities.
arXiv Detail & Related papers (2025-03-27T16:21:53Z) - Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks.
By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z) - Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging [35.53877806259048]
This paper establishes the first comprehensive benchmark for model merging in large language models (LLMs)<n>Our analysis reveals three pivotal insights: (i) previously overlooked collaborative/conflicting relationships among 3H dimensions, (ii) the consistent superiority of model merging over data mixture approaches in balancing alignment trade-offs, and (iii) the critical role of parameter-level conflict resolution through redundant component pruning and outlier mitigation.<n>We propose R-TSVM, a Reweighting-enhanced Task Singular Vector Merging method that incorporates outlier-aware parameter weighting and sparsity-adaptive rank selection strategies adapted to the heavy-tailed parameter
arXiv Detail & Related papers (2025-02-08T11:56:58Z) - Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging [30.38047100067552]
Reasoning capabilities represent a critical frontier for large language models.<n>One way to efficiently supplement capabilities with is by model merging.<n>We propose an Automated Model Merging Framework that enables fine-grained exploration of merging strategies.
arXiv Detail & Related papers (2025-02-06T12:47:25Z) - InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion [35.98702433016698]
InfiFusion is an efficient training pipeline designed to integrate domain-specialized Large Language Models (LLMs) into a single pivot model.<n>We propose two fusion strategies: Pairwise Fusion (InfiFusion$_p$) and Unified Fusion (InfiFusion$_u$)<n>InfiFusion outperforms the state-of-the-art models, such as Qwen-2.5-14B-Instruct and Phi-4, across 11 widely applied benchmarks.
arXiv Detail & Related papers (2025-01-06T06:29:55Z) - Non-Uniform Parameter-Wise Model Merging [17.989809995141044]
We introduce a novel approach, Non-uniform.<n>wise Model Merging, or NP Merge, which merges models by learning the contribution of each.<n> parameter to the final model using gradient-based optimization.<n>We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings, outperforming past methods.
arXiv Detail & Related papers (2024-12-20T00:05:14Z) - Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation [0.9084344604313794]
This paper explores model merging techniques across a spectrum of complexity.
We introduce Differentiable Adaptive Merging (DAM), an efficient, adaptive merging approach.
Our findings reveal that even simple averaging methods, like Model Soups, perform competitively when model similarity is high.
arXiv Detail & Related papers (2024-10-10T20:58:29Z) - Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline.<n>We benchmark existing scaling techniques, especially selective merging, and variants of mixture.<n>We then formulate an optimal strategy for the selection and aggregation of a heterogeneous model zoo.<n>Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters.
arXiv Detail & Related papers (2024-10-07T15:55:55Z) - Model Merging and Safety Alignment: One Bad Model Spoils the Bunch [70.614652904151]
Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model.
Current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models.
We evaluate several popular model merging techniques, demonstrating that existing methods do not only transfer domain expertise but also propagate misalignment.
arXiv Detail & Related papers (2024-06-20T17:59:58Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.