Related papers: PSO-Merging: Merging Models Based on Particle Swarm Optimization

PSO-Merging: Merging Models Based on Particle Swarm Optimization

URL: http://arxiv.org/abs/2508.19839v1
Date: Wed, 27 Aug 2025 12:52:36 GMT
Title: PSO-Merging: Merging Models Based on Particle Swarm Optimization
Authors: Kehao Zhang, Shaolei Zhang, Yang Feng,
Abstract summary: We introduce PSO-Merging, a novel data-driven merging method based on the Particle Swarm Optimization (PSO)<n>In this approach, we initialize the particle swarm with a pre-trained model, expert models, and sparsified expert models.<n>We then perform multiple iterations, with the final global best particle serving as the merged model.
Score: 36.641774346671504
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Model merging has emerged as an efficient strategy for constructing multitask models by integrating the strengths of multiple available expert models, thereby reducing the need to fine-tune a pre-trained model for all the tasks from scratch. Existing data-independent methods struggle with performance limitations due to the lack of data-driven guidance. Data-driven approaches also face key challenges: gradient-based methods are computationally expensive, limiting their practicality for merging large expert models, whereas existing gradient-free methods often fail to achieve satisfactory results within a limited number of optimization steps. To address these limitations, this paper introduces PSO-Merging, a novel data-driven merging method based on the Particle Swarm Optimization (PSO). In this approach, we initialize the particle swarm with a pre-trained model, expert models, and sparsified expert models. We then perform multiple iterations, with the final global best particle serving as the merged model. Experimental results on different language models show that PSO-Merging generally outperforms baseline merging methods, offering a more efficient and scalable solution for model merging.

Related papers

The Law of Multi-Model Collaboration: Scaling Limits of Model Ensembling for Large Language Models [54.51795784459866]
We propose a theoretical framework of performance scaling for multi-model collaboration.<n>We show that multi-model systems follow a power-law scaling with respect to the total parameter count.<n> ensembles of heterogeneous model families achieve better performance scaling than those formed within a single model family.
arXiv Detail & Related papers (2025-12-29T09:55:12Z)
Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z)
Black-box Model Merging for Language-Model-as-a-Service with Massive Model Repositories [21.899117703417517]
We propose a derivative-free optimization framework based on the evolutionary algorithm (Evo-Merging)<n>Our method consists of two key components: (1) sparsity-based denoising, designed to identify and filter out irrelevant or redundant information across models, and (2) sign-aware scaling, which dynamically computes optimal combination weights for the relevant models based on their performance.<n>Our approach achieves state-of-the-art results on a range of tasks, significantly outperforming existing strong baselines.
arXiv Detail & Related papers (2025-09-16T10:55:50Z)
NAN: A Training-Free Solution to Coefficient Estimation in Model Merging [61.36020737229637]
We show that the optimal merging weights should scale with the amount of task-specific information encoded in each model.<n>We propose NAN, a simple yet effective method that estimates model merging coefficients via the inverse of parameter norm.<n>NAN is training-free, plug-and-play, and applicable to a wide range of merging strategies.
arXiv Detail & Related papers (2025-05-22T02:46:08Z)
Dynamic Fisher-weighted Model Merging via Bayesian Optimization [37.02810891820468]
Existing merging approaches typically involve scaling the parameters model-wise or integrating parameter importance parameter-wise.<n>We unify these strategies into a more general merging framework, and introduce Dynamic Fisher-weighted Merging (DF-Merge)<n>We show that DF-Merge outperforms strong baselines across models of different sizes and a variety of tasks.
arXiv Detail & Related papers (2025-04-26T18:31:14Z)
Non-Uniform Parameter-Wise Model Merging [17.989809995141044]
We introduce a novel approach, Non-uniform.<n>wise Model Merging, or NP Merge, which merges models by learning the contribution of each.<n> parameter to the final model using gradient-based optimization.<n>We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings, outperforming past methods.
arXiv Detail & Related papers (2024-12-20T00:05:14Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [47.432215933099016]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.<n>This creates a barrier to fusing knowledge across individual models to yield a better single model.<n>We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.