Related papers: Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

URL: http://arxiv.org/abs/2502.04030v1
Date: Thu, 06 Feb 2025 12:47:25 GMT
Title: Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging
Authors: Guinan Su, Jonas Geiping,
Abstract summary: Reasoning capabilities represent a critical frontier for large language models.<n>One way to efficiently supplement capabilities with is by model merging.<n>We propose an Automated Model Merging Framework that enables fine-grained exploration of merging strategies.
Score: 30.38047100067552
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reasoning capabilities represent a critical frontier for large language models (LLMs), but developing them requires extensive proprietary datasets and computational resources. One way to efficiently supplement capabilities with is by model merging, which offers a promising alternative by combining multiple models without retraining. However, current merging approaches rely on manually-designed strategies for merging hyperparameters, limiting the exploration of potential model combinations and requiring significant human effort. We propose an Automated Model Merging Framework that enables fine-grained exploration of merging strategies while reducing costs through multi-fidelity approximations. We support both single and multi-objective optimization and introduce two novel search spaces: layerwise fusion (LFS) and depth-wise integration (DIS). Evaluating across a number of benchmarks, we find that the search autonomously finds 1) Merges that further boost single-objective performance, even on tasks the model has already been finetuned on, and 2) Merges that optimize multi-objective frontiers across tasks. Effective merges are found with limited compute, e.g. within less than 500 search steps.

Related papers

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization [86.8133939108057]
We propose AdaMMS, a novel model merging method tailored for heterogeneous MLLMs. Our method tackles the challenges in three steps: mapping, merging and searching. As the first model merging method capable of merging heterogeneous MLLMs without labeled data, AdaMMS outperforms previous model merging methods on various vision-language benchmarks.
arXiv Detail & Related papers (2025-03-31T05:13:02Z)
Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks. By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z)
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation [15.47711837051754]
We propose Mixup Model Merge, an innovative approach inspired by the Mixup data augmentation technique. M$3$ is a simple yet effective model merging method that significantly enhances the performance of the merged model.
arXiv Detail & Related papers (2025-02-21T13:01:26Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches. We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.<n>We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.<n>Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z)
Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation [0.9084344604313794]
This paper explores model merging techniques across a spectrum of complexity. We introduce Differentiable Adaptive Merging (DAM), an efficient, adaptive merging approach. Our findings reveal that even simple averaging methods, like Model Soups, perform competitively when model similarity is high.
arXiv Detail & Related papers (2024-10-10T20:58:29Z)
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models [28.993221775758702]
Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability. This paper marks a significant advance toward more flexible and comprehensive model merging techniques. We train policy and value networks using offline sampling of weight vectors, which are then employed for the online optimization of merging strategies.
arXiv Detail & Related papers (2024-09-27T16:31:31Z)
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities [89.40778301238642]
Model merging is an efficient empowerment technique in the machine learning community. There is a significant gap in the literature regarding a systematic and thorough review of these techniques.
arXiv Detail & Related papers (2024-08-14T16:58:48Z)
It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization [16.54335356612006]
The goal of model merging is to combine multiple models, each excelling in different tasks, into a single model that outperforms any of the individual source models. Existing methods rely heavily on human knowledge or intuition. It's difficult to obtain the great model merging configuration in limited evaluations.
arXiv Detail & Related papers (2024-06-29T16:34:23Z)
An Empirical Study of Multimodal Model Merging [148.48412442848795]
Model merging is a technique that fuses multiple models trained on different tasks to generate a multi-task solution. We conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture. We propose two metrics that assess the distance between weights to be merged and can serve as an indicator of the merging outcomes.
arXiv Detail & Related papers (2023-04-28T15:43:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.