Competition and Attraction Improve Model Fusion
- URL: http://arxiv.org/abs/2508.16204v1
- Date: Fri, 22 Aug 2025 08:24:02 GMT
- Title: Competition and Attraction Improve Model Fusion
- Authors: João Abrantes, Robert Tjarko Lange, Yujin Tang,
- Abstract summary: Model merging is a powerful technique for integrating the specialized knowledge of multiple machine learning models into a single model.<n>We propose Model Merging of Natural Niches (M2N2), an evolutionary algorithm with three key features.<n>M2N2 scales to merge specialized language and image generation models, achieving state-of-the-art performance.
- Score: 17.83054848742515
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model merging is a powerful technique for integrating the specialized knowledge of multiple machine learning models into a single model. However, existing methods require manually partitioning model parameters into fixed groups for merging, which restricts the exploration of potential combinations and limits performance. To overcome these limitations, we propose Model Merging of Natural Niches (M2N2), an evolutionary algorithm with three key features: (1) dynamic adjustment of merging boundaries to progressively explore a broader range of parameter combinations; (2) a diversity preservation mechanism inspired by the competition for resources in nature, to maintain a population of diverse, high-performing models that are particularly well-suited for merging; and (3) a heuristicbased attraction metric to identify the most promising pairs of models for fusion. Our experimental results demonstrate, for the first time, that model merging can be used to evolve models entirely from scratch. Specifically, we apply M2N2 to evolve MNIST classifiers from scratch and achieve performance comparable to CMA-ES, while being computationally more efficient. Furthermore, M2N2 scales to merge specialized language and image generation models, achieving state-of-the-art performance. Notably, it preserves crucial model capabilities beyond those explicitly optimized by the fitness function, highlighting its robustness and versatility. Our code is available at https://github.com/SakanaAI/natural_niches
Related papers
- Towards Reversible Model Merging For Low-rank Weights [5.100622189286672]
Model merging aims to combine multiple fine-tuned models into a single set of weights that performs well across all source tasks.<n>We show that applying conventional merging methods to low-rank weights leads to severe performance degradation in the merged model.<n>We propose a fundamentally different approach: instead of collapsing all adapters into one set of weights, we construct a compact basis.<n>This reframes merging as generating a reconstruction-capable model space rather than producing a single merged model.
arXiv Detail & Related papers (2025-10-15T23:22:38Z) - Black-box Model Merging for Language-Model-as-a-Service with Massive Model Repositories [21.899117703417517]
We propose a derivative-free optimization framework based on the evolutionary algorithm (Evo-Merging)<n>Our method consists of two key components: (1) sparsity-based denoising, designed to identify and filter out irrelevant or redundant information across models, and (2) sign-aware scaling, which dynamically computes optimal combination weights for the relevant models based on their performance.<n>Our approach achieves state-of-the-art results on a range of tasks, significantly outperforming existing strong baselines.
arXiv Detail & Related papers (2025-09-16T10:55:50Z) - SE-Merging: A Self-Enhanced Approach for Dynamic Model Merging [60.83635006372403]
textttSE-Merging is a self-enhanced model merging framework.<n>We show that textttSE-Merging achieves dynamic model merging without additional training.
arXiv Detail & Related papers (2025-06-22T18:38:41Z) - Navigating the Accuracy-Size Trade-Off with Flexible Model Merging [15.497612580389479]
We propose FlexMerge, a novel data-free model merging framework.<n>It flexibly generates merged models of varying sizes, spanning the full spectrum from a single merged model to retaining all fine-tuned models.<n>Using FlexMerge, we systematically characterize the accuracy-size trade-off of different algorithms.
arXiv Detail & Related papers (2025-05-29T07:50:32Z) - AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization [86.8133939108057]
We propose AdaMMS, a novel model merging method tailored for heterogeneous MLLMs.<n>Our method tackles the challenges in three steps: mapping, merging and searching.<n>As the first model merging method capable of merging heterogeneous MLLMs without labeled data, AdaMMS outperforms previous model merging methods on various vision-language benchmarks.
arXiv Detail & Related papers (2025-03-31T05:13:02Z) - Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation [15.47711837051754]
Model merging aims to integrate multiple task-specific models into a unified model that inherits the capabilities of the task-specific models.<n>Existing model merging methods often lack consideration of the varying contribution ratios of different task-specific models to the final merged model.<n>We propose Mixup Model Merge (M3), a simple yet effective method inspired by the randomized linear strategy from the Mixup data augmentation technique.
arXiv Detail & Related papers (2025-02-21T13:01:26Z) - Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - Exploring Model Kinship for Merging Large Language Models [52.01652098827454]
We introduce model kinship, the degree of similarity or relatedness between Large Language Models.<n>We find that there is a certain relationship between model kinship and the performance gains after model merging.<n>We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can yield better performance on benchmark datasets.
arXiv Detail & Related papers (2024-10-16T14:29:29Z) - FusionBench: A Comprehensive Benchmark of Deep Model Fusion [78.80920533793595]
Deep model fusion is a technique that unifies the predictions or parameters of several deep neural networks into a single model.
FusionBench is the first comprehensive benchmark dedicated to deep model fusion.
arXiv Detail & Related papers (2024-06-05T13:54:28Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Training-Free Pretrained Model Merging [38.16269074353077]
We propose an innovative model merging framework, coined as merging under dual-space constraints (MuDSC)
In order to enhance usability, we have also incorporated adaptations for group structure, including Multi-Head Attention and Group Normalization.
arXiv Detail & Related papers (2024-03-04T06:19:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.