Exploring Model Kinship for Merging Large Language Models
- URL: http://arxiv.org/abs/2410.12613v3
- Date: Tue, 23 Sep 2025 17:26:13 GMT
- Title: Exploring Model Kinship for Merging Large Language Models
- Authors: Yedi Hu, Yunzhi Yao, Ningyu Zhang, Huajun Chen, Shumin Deng,
- Abstract summary: We study model evolution through iterative merging, drawing an analogy to biological evolution.<n>We show that model kinship is closely linked to the performance improvements achieved by merging.<n>We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship.
- Score: 73.98345036483299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model merging has emerged as a key technique for enhancing the capabilities and efficiency of Large Language Models (LLMs). The open-source community has driven model evolution by iteratively merging existing models, yet a principled understanding of the gains and underlying factors in model merging remains limited. In this work, we study model evolution through iterative merging, drawing an analogy to biological evolution, and introduce the concept of model kinship, the degree of similarity or relatedness between LLMs. Through comprehensive empirical analysis, we show that model kinship is closely linked to the performance improvements achieved by merging, providing a useful criterion for selecting candidate models. Building on this insight, we propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can improve benchmark performance. Specifically, we discover that incorporating model kinship as a guiding criterion enables continuous merging while mitigating performance degradation caused by local optima, thereby facilitating more effective model evolution. Code is available at https://github.com/zjunlp/ModelKinship.
Related papers
- Revisiting Model Interpolation for Efficient Reasoning [27.32667995137936]
We revisit the simplest merging method that interpolates two weights directly.<n>We observe that model follows a three-stage evolutionary paradigm with distinct behaviors on the reasoning trajectory.
arXiv Detail & Related papers (2025-10-13T03:30:01Z) - Competition and Attraction Improve Model Fusion [17.83054848742515]
Model merging is a powerful technique for integrating the specialized knowledge of multiple machine learning models into a single model.<n>We propose Model Merging of Natural Niches (M2N2), an evolutionary algorithm with three key features.<n>M2N2 scales to merge specialized language and image generation models, achieving state-of-the-art performance.
arXiv Detail & Related papers (2025-08-22T08:24:02Z) - Why Do More Experts Fail? A Theoretical Analysis of Model Merging [51.18155031364046]
Model merging dramatically reduces storage and computational resources by combining multiple expert models into a single multi-task model.<n>Recent model merging methods have shown promising results, but struggle to maintain performance gains as the number of merged models increases.<n>We show that the limited effective parameter space imposes a strict constraint on the number of models that can be successfully merged.
arXiv Detail & Related papers (2025-05-27T14:10:46Z) - Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging [103.98582374569789]
Model merging aims to combine multiple expert models into a single model, thereby reducing storage and serving costs.<n>Previous studies have primarily focused on merging visual classification models or Large Language Models (LLMs) for code and math tasks.<n>We introduce the model merging benchmark for MLLMs, which includes multiple tasks such as VQA, Geometry, Chart, OCR, and Grounding, providing both LoRA and full fine-tuning models.
arXiv Detail & Related papers (2025-05-26T12:23:14Z) - Multi-Level Collaboration in Model Merging [56.31088116526825]
This paper explores the intrinsic connections between model merging and model ensembling.
We find that even when previous restrictions are not met, there is still a way for model merging to attain a near-identical and superior performance similar to that of ensembling.
arXiv Detail & Related papers (2025-03-03T07:45:04Z) - In-Model Merging for Enhancing the Robustness of Medical Imaging Classification Models [5.871732354387235]
We propose in-model merging (InMerge), a novel approach that enhances the model's robustness.
We demonstrate the feasibility and effectiveness of this technique for different CNN architectures.
The proposed InMerge-trained model surpasses the typically-trained model by a substantial margin.
arXiv Detail & Related papers (2025-02-27T20:52:55Z) - Training-free Heterogeneous Model Merging [40.681362819808136]
We propose an innovative model merging framework designed for heterogeneous models.
We show that the merging of structurally heterogeneous models can achieve performance levels comparable to those of homogeneous merging.
Our code is publicly available at https://github.com/zju-vipa/training_free_heterogeneous_model_merging.
arXiv Detail & Related papers (2024-12-29T04:49:11Z) - A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline.
Our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture.
Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture.
arXiv Detail & Related papers (2024-10-07T15:55:55Z) - What Matters for Model Merging at Scale? [94.26607564817786]
Model merging aims to combine multiple expert models into a more capable single model.
Previous studies have primarily focused on merging a few small models.
This study systematically evaluates the utility of model merging at scale.
arXiv Detail & Related papers (2024-10-04T17:17:19Z) - Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities [4.389938747401259]
This work explores the effects of fine-tuning strategies on Large Language Models (LLMs) in domains such as materials science and engineering.
We find that the merging of multiple fine-tuned models can lead to the emergence of capabilities that surpass the individual contributions of the parent models.
arXiv Detail & Related papers (2024-09-05T11:49:53Z) - Knowledge Fusion By Evolving Weights of Language Models [5.354527640064584]
This paper examines the approach of integrating multiple models into a unified model.
We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms.
arXiv Detail & Related papers (2024-06-18T02:12:34Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Comparing Foundation Models using Data Kernels [13.099029073152257]
We present a methodology for directly comparing the embedding space geometry of foundation models.
Our methodology is grounded in random graph theory and enables valid hypothesis testing of embedding similarity.
We show how our framework can induce a manifold of models equipped with a distance function that correlates strongly with several downstream metrics.
arXiv Detail & Related papers (2023-05-09T02:01:07Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.