A Systematic Study of Model Merging Techniques in Large Language Models
- URL: http://arxiv.org/abs/2511.21437v1
- Date: Wed, 26 Nov 2025 14:28:11 GMT
- Title: A Systematic Study of Model Merging Techniques in Large Language Models
- Authors: Oğuz Kağan Hitit, Leander Girrbach, Zeynep Akata,
- Abstract summary: Model merging combines multiple fine-tuned checkpoints into a single model without additional training.<n>We present a large-scale, systematic evaluation of six state-of-the-art merging methods.<n>Results show that the oldest and simplest method, Task Arithmetic, is the only approach that reliably yields performance gains on LLMs.
- Score: 43.5967188676583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for smaller models and classifiers generalize to LLMs. We present a large-scale, systematic evaluation of six state-of-the-art merging methods, including recent subspace methods, across four open-weight LLMs, twelve fine-tuned checkpoints per base model, and sixteen standard LLM benchmarks. Evaluating through standardized benchmarks, we measure both the probability that a merged model outperforms the base model and relative gains over the best individual checkpoint. Our results show that the oldest and simplest method, Task Arithmetic, is the only approach that reliably yields performance gains on LLMs. Other interference-aware and subspace merging methods typically result in significant performance drops. Our findings indicate that current merging techniques do not directly transfer to modern LLMs. This motivates the design of LLM-specific merging algorithms and merging-aware fine-tuning methods. Code will be released upon acceptance of this paper.
Related papers
- Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches [0.0]
We explore strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints.<n>Two approaches are investigated: (1) attaching a classification head to a pre-trained causal LLM and fine-tuning on the task, and (2) instruction-tuning the LLM in a prompt->response format for classification.
arXiv Detail & Related papers (2025-12-14T13:02:06Z) - Merge and Bound: Direct Manipulations on Weights for Class Incremental Learning [39.77371020337677]
We present a novel training approach, named Merge-and-Bound (M&B) for Class Incremental Learning (CIL)<n>Our algorithm involves two types of weight merging: inter-task weight merging and intra-task weight merging.<n>We extensively evaluate our algorithm on standard CIL benchmarks and demonstrate superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-11-26T15:24:53Z) - OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging [124.91183814854126]
Model merging seeks to combine multiple expert models into a single model.<n>We introduce a benchmark for model merging research that clearly divides the tasks for MLLM training and evaluation.<n>We find that model merging offers a promising way for building improved MLLMs without requiring training data.
arXiv Detail & Related papers (2025-05-26T12:23:14Z) - NAN: A Training-Free Solution to Coefficient Estimation in Model Merging [61.36020737229637]
We show that the optimal merging weights should scale with the amount of task-specific information encoded in each model.<n>We propose NAN, a simple yet effective method that estimates model merging coefficients via the inverse of parameter norm.<n>NAN is training-free, plug-and-play, and applicable to a wide range of merging strategies.
arXiv Detail & Related papers (2025-05-22T02:46:08Z) - Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs [51.09983600916971]
Recent research indicates that models demonstrating linearity enhance the performance of task arithmetic.<n>We argue that this linearity already exists within the model's submodules.<n>We propose an innovative model merging strategy that independently merges these submodules.
arXiv Detail & Related papers (2025-04-15T06:23:24Z) - Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline.<n>We benchmark existing scaling techniques, especially selective merging, and variants of mixture.<n>We then formulate an optimal strategy for the selection and aggregation of a heterogeneous model zoo.<n>Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters.
arXiv Detail & Related papers (2024-10-07T15:55:55Z) - Model Merging and Safety Alignment: One Bad Model Spoils the Bunch [70.614652904151]
Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model.
Current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models.
We evaluate several popular model merging techniques, demonstrating that existing methods do not only transfer domain expertise but also propagate misalignment.
arXiv Detail & Related papers (2024-06-20T17:59:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.