Related papers: Surrogate Benchmarks for Model Merging Optimization

Surrogate Benchmarks for Model Merging Optimization

URL: http://arxiv.org/abs/2509.02555v1
Date: Tue, 02 Sep 2025 17:51:03 GMT
Title: Surrogate Benchmarks for Model Merging Optimization
Authors: Rio Akizuki, Yuya Kudo, Nozomu Yoshinari, Yoichi Hirose, Toshiyuki Nishimoto, Kento Uchida, Shinichi Shirakawa,
Abstract summary: We develop surrogate benchmarks for optimization of the merging hyper parameters.<n>Our benchmarks can predict the performance of merged models well and simulate optimization algorithm behaviors.
Score: 2.579878570919875
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model merging techniques aim to integrate the abilities of multiple models into a single model. Most model merging techniques have hyperparameters, and their setting affects the performance of the merged model. Because several existing works show that tuning hyperparameters in model merging can enhance the merging outcome, developing hyperparameter optimization algorithms for model merging is a promising direction. However, its optimization process is computationally expensive, particularly in merging LLMs. In this work, we develop surrogate benchmarks for optimization of the merging hyperparameters to realize algorithm development and performance comparison at low cost. We define two search spaces and collect data samples to construct surrogate models to predict the performance of a merged model from a hyperparameter. We demonstrate that our benchmarks can predict the performance of merged models well and simulate optimization algorithm behaviors.

Related papers

Merge and Guide: Unifying Model Merging and Guided Decoding for Controllable Multi-Objective Generation [49.98025799046136]
We introduce Merge-And-GuidE, a two-stage framework that leverages model merging for guided decoding.<n>In Stage 1, MAGE resolves a compatibility problem between the guidance and base models.<n>In Stage 2, we merge explicit and implicit value models into a unified guidance proxy, which then steers the decoding of the base model from Stage 1.
arXiv Detail & Related papers (2025-10-04T11:10:07Z)
Why Do More Experts Fail? A Theoretical Analysis of Model Merging [51.18155031364046]
Model merging dramatically reduces storage and computational resources by combining multiple expert models into a single multi-task model.<n>Recent model merging methods have shown promising results, but struggle to maintain performance gains as the number of merged models increases.<n>We show that the limited effective parameter space imposes a strict constraint on the number of models that can be successfully merged.
arXiv Detail & Related papers (2025-05-27T14:10:46Z)
Dynamic Fisher-weighted Model Merging via Bayesian Optimization [37.02810891820468]
Existing merging approaches typically involve scaling the parameters model-wise or integrating parameter importance parameter-wise.<n>We unify these strategies into a more general merging framework, and introduce Dynamic Fisher-weighted Merging (DF-Merge)<n>We show that DF-Merge outperforms strong baselines across models of different sizes and a variety of tasks.
arXiv Detail & Related papers (2025-04-26T18:31:14Z)
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization [86.8133939108057]
We propose AdaMMS, a novel model merging method tailored for heterogeneous MLLMs.<n>Our method tackles the challenges in three steps: mapping, merging and searching.<n>As the first model merging method capable of merging heterogeneous MLLMs without labeled data, AdaMMS outperforms previous model merging methods on various vision-language benchmarks.
arXiv Detail & Related papers (2025-03-31T05:13:02Z)
Model Assembly Learning with Heterogeneous Layer Weight Merging [57.8462476398611]
We introduce Model Assembly Learning (MAL), a novel paradigm for model merging.<n>MAL integrates parameters from diverse models in an open-ended model zoo to enhance the base model's capabilities.
arXiv Detail & Related papers (2025-03-27T16:21:53Z)
Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks.<n>By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z)
Extrapolation Merging: Keep Improving With Extrapolation and Merging [14.786100203787194]
Large Language Models (LLMs) require instruction fine-tuning to perform different downstream tasks.<n>Model merging aims to enhance performance by combining the parameters of different models.<n>We propose Extrapolation Merging, a paradigm that can continue improving model performance without requiring extra computational resources or data.
arXiv Detail & Related papers (2025-03-05T14:28:22Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
Non-Uniform Parameter-Wise Model Merging [17.989809995141044]
We introduce a novel approach, Non-uniform.<n>wise Model Merging, or NP Merge, which merges models by learning the contribution of each.<n> parameter to the final model using gradient-based optimization.<n>We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings, outperforming past methods.
arXiv Detail & Related papers (2024-12-20T00:05:14Z)
Pareto Merging: Multi-Objective Optimization for Preference-Aware Model Merging [11.186194228460273]
We propose a preference-aware model merging problem in which the performance of the merged model on each base model's task is treated as an objective.<n>We show that the proposed model merging produces diverse trade-off models and achieves higher test accuracy compared to state-of-the-art merging baselines.
arXiv Detail & Related papers (2024-08-22T03:41:14Z)
It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization [16.54335356612006]
The goal of model merging is to combine multiple models, each excelling in different tasks, into a single model that outperforms any of the individual source models. Existing methods rely heavily on human knowledge or intuition. It's difficult to obtain the great model merging configuration in limited evaluations.
arXiv Detail & Related papers (2024-06-29T16:34:23Z)
Understanding Parameter Sharing in Transformers [53.75988363281843]
Previous work on Transformers has focused on sharing parameters in different layers, which can improve the performance of models with limited parameters by increasing model depth. We show that the success of this approach can be largely attributed to better convergence, with only a small part due to the increased model complexity. Experiments on 8 machine translation tasks show that our model achieves competitive performance with only half the model complexity of parameter sharing models.
arXiv Detail & Related papers (2023-06-15T10:48:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.