Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
- URL: http://arxiv.org/abs/2505.23209v1
- Date: Thu, 29 May 2025 07:50:32 GMT
- Title: Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
- Authors: Akash Dhasade, Divyansh Jhunjhunwala, Milos Vujasinovic, Gauri Joshi, Anne-Marie Kermarrec,
- Abstract summary: We propose FlexMerge, a novel data-free model merging framework.<n>We show that even modestly larger merged models can provide substantial accuracy improvements over a single model.<n>By offering fine-grained control over fused model size, FlexMerge provides a flexible, data-free, and high-performance solution.
- Score: 16.936134010292232
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model merging has emerged as an efficient method to combine multiple single-task fine-tuned models. The merged model can enjoy multi-task capabilities without expensive training. While promising, merging into a single model often suffers from an accuracy gap with respect to individual fine-tuned models. On the other hand, deploying all individual fine-tuned models incurs high costs. We propose FlexMerge, a novel data-free model merging framework to flexibly generate merged models of varying sizes, spanning the spectrum from a single merged model to retaining all individual fine-tuned models. FlexMerge treats fine-tuned models as collections of sequential blocks and progressively merges them using any existing data-free merging method, halting at a desired size. We systematically explore the accuracy-size trade-off exhibited by different merging algorithms in combination with FlexMerge. Extensive experiments on vision and NLP benchmarks, with up to 30 tasks, reveal that even modestly larger merged models can provide substantial accuracy improvements over a single model. By offering fine-grained control over fused model size, FlexMerge provides a flexible, data-free, and high-performance solution for diverse deployment scenarios.
Related papers
- Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging [103.98582374569789]
Model merging aims to combine multiple expert models into a single model, thereby reducing storage and serving costs.<n>Previous studies have primarily focused on merging visual classification models or Large Language Models (LLMs) for code and math tasks.<n>We introduce the model merging benchmark for MLLMs, which includes multiple tasks such as VQA, Geometry, Chart, OCR, and Grounding, providing both LoRA and full fine-tuning models.
arXiv Detail & Related papers (2025-05-26T12:23:14Z) - AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization [86.8133939108057]
We propose AdaMMS, a novel model merging method tailored for heterogeneous MLLMs.<n>Our method tackles the challenges in three steps: mapping, merging and searching.<n>As the first model merging method capable of merging heterogeneous MLLMs without labeled data, AdaMMS outperforms previous model merging methods on various vision-language benchmarks.
arXiv Detail & Related papers (2025-03-31T05:13:02Z) - FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization [16.420834802431536]
We introduce Frank-Wolfe Merging (FW-Merging) as a constrained optimization problem.<n>FW-Merging surpasses the data-free merging method by 32.8% and outperforms the data-informed Adamerging by 8.39% when merging 20 ViT models.<n>Our experiments show that FW-Merging scales across diverse model sources, remaining stable with 16 irrelevant models and improving by 15.3% with 16 relevant models on 20 CV tasks, while maintaining constant memory overhead.
arXiv Detail & Related papers (2025-03-16T21:07:05Z) - Why Train Everything? Tint a Single Layer for Multi-task Model Merging [17.496018757317824]
Model merging integrates independently fine-tuned models into a single multi-task model, offering a flexible alternative to joint training.<n>Many existing model merging methods introduce additional task-specific components, increasing complexity and requiring extra modifications.<n>We propose Model Tinting, a lightweight yet highly effective approach that improves model merging by updating just a single layer.
arXiv Detail & Related papers (2024-12-26T07:42:06Z) - What Matters for Model Merging at Scale? [94.26607564817786]
Model merging aims to combine multiple expert models into a more capable single model.
Previous studies have primarily focused on merging a few small models.
This study systematically evaluates the utility of model merging at scale.
arXiv Detail & Related papers (2024-10-04T17:17:19Z) - Pareto Merging: Multi-Objective Optimization for Preference-Aware Model Merging [11.186194228460273]
We propose a preference-aware model merging problem in which the performance of the merged model on each base model's task is treated as an objective.<n>We show that the proposed model merging produces diverse trade-off models and achieves higher test accuracy compared to state-of-the-art merging baselines.
arXiv Detail & Related papers (2024-08-22T03:41:14Z) - PLeaS -- Merging Models with Permutations and Least Squares [43.17620198572947]
We propose a new two-step algorithm to merge models -- termed PLeaS -- which relaxes constraints.<n>PLeaS partially matches nodes in each layer by maximizing alignment.<n>We also demonstrate how our method can be extended to address a challenging scenario where no data is available from the fine-tuning domains.
arXiv Detail & Related papers (2024-07-02T17:24:04Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [47.432215933099016]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.<n>This creates a barrier to fusing knowledge across individual models to yield a better single model.<n>We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model.
In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.
We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z) - When Ensembling Smaller Models is More Efficient than Single Large
Models [52.38997176317532]
We show that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute.
This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models.
arXiv Detail & Related papers (2020-05-01T18:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.