Related papers: Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

Related papers

AutoMerge: Search-Based Model Merging Framework for Effective Model Reuse [8.950520457150178]
Recently, model merging has emerged in the domain of large language models (LLMs) as a training-free approach.<n>No prior work has systematically investigated whether such an approach can be effectively applied to other deep learning models.<n>We present the first systematic study that evaluates five model merging techniques on three distinct model architectures.
arXiv Detail & Related papers (2026-01-30T09:27:01Z)
A Systematic Study of Model Merging Techniques in Large Language Models [43.5967188676583]
Model merging combines multiple fine-tuned checkpoints into a single model without additional training.<n>We present a large-scale, systematic evaluation of six state-of-the-art merging methods.<n>Results show that the oldest and simplest method, Task Arithmetic, is the only approach that reliably yields performance gains on LLMs.
arXiv Detail & Related papers (2025-11-26T14:28:11Z)
Merge and Guide: Unifying Model Merging and Guided Decoding for Controllable Multi-Objective Generation [49.98025799046136]
We introduce Merge-And-GuidE, a two-stage framework that leverages model merging for guided decoding.<n>In Stage 1, MAGE resolves a compatibility problem between the guidance and base models.<n>In Stage 2, we merge explicit and implicit value models into a unified guidance proxy, which then steers the decoding of the base model from Stage 1.
arXiv Detail & Related papers (2025-10-04T11:10:07Z)
Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging [38.12136955174922]
Fine-tuning large language models (LMs) for individual tasks yields strong performance but is expensive for deployment and storage.<n>Recent works explore model merging to combine multiple task-specific models into a single multi-task model without additional training.<n>Existing merging methods often fail for models fine-tuned with low-rank adaptation (LoRA), due to significant performance degradation.
arXiv Detail & Related papers (2025-05-28T23:28:12Z)
SeMe: Training-Free Language Model Merging via Semantic Alignment [32.178931149612644]
SeMe is a novel, data-free, and training-free approach that leverages latent semantic alignment to merge LMs at a fine-grained, layer-wise level.<n>We demonstrate that SeMe outperforms existing methods in both performance and efficiency while eliminating reliance on external data.<n>Our work establishes a new paradigm for knowledge-aware model merging, paving the way for more scalable and interpretable model composition.
arXiv Detail & Related papers (2025-05-26T15:45:56Z)
Dynamic Fisher-weighted Model Merging via Bayesian Optimization [37.02810891820468]
Existing merging approaches typically involve scaling the parameters model-wise or integrating parameter importance parameter-wise. We unify these strategies into a more general merging framework, and introduce Dynamic Fisher-weighted Merging (DF-Merge) We show that DF-Merge outperforms strong baselines across models of different sizes and a variety of tasks.
arXiv Detail & Related papers (2025-04-26T18:31:14Z)
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization [86.8133939108057]
We propose AdaMMS, a novel model merging method tailored for heterogeneous MLLMs. Our method tackles the challenges in three steps: mapping, merging and searching. As the first model merging method capable of merging heterogeneous MLLMs without labeled data, AdaMMS outperforms previous model merging methods on various vision-language benchmarks.
arXiv Detail & Related papers (2025-03-31T05:13:02Z)
LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach [0.0]
LEWIS (Layer Wise Sparsity) is a guided model-merging framework. It guides existing merging methods by preserving essential layer-wise task-specific knowledge. Experiments demonstrate the effectiveness of LEWIS with performance improvements of code instruction-following and math-solving models.
arXiv Detail & Related papers (2025-03-05T20:09:59Z)
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation [15.47711837051754]
We propose Mixup Model Merge, an innovative approach inspired by the Mixup data augmentation technique. M$3$ is a simple yet effective model merging method that significantly enhances the performance of the merged model.
arXiv Detail & Related papers (2025-02-21T13:01:26Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches. We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs [64.83462841029089]
We introduce an efficient merging-based alignment method called textscMergeAlign that interpolates the domain and alignment vectors, creating safer domain-specific models. We apply textscMergeAlign on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks.
arXiv Detail & Related papers (2024-11-11T09:32:20Z)
The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse [25.002218722102505]
Model merging aims to efficiently combine the weights of multiple expert models, each trained on a specific task, into a single multi-task model. This work explores the more challenging scenario of "non-local" merging. Standard merging techniques often fail to generalize effectively in this non-local setting. We propose a multi-task technique to re-scale and shift the output activations of the merged model for each task, aligning its output statistics with those of the corresponding task-specific expert models.
arXiv Detail & Related papers (2024-10-16T17:41:59Z)
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline. Our work starts with a benchmarking of existing LLM scaling techniques, especially selective merging, and variants of mixture. Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters through a model mixture.
arXiv Detail & Related papers (2024-10-07T15:55:55Z)
Parameter Competition Balancing for Model Merging [13.66727853299506]
PCB-Merging is a training-free technique that adjusts the coefficients of each parameter for effective model merging. PCB-Merging achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models.
arXiv Detail & Related papers (2024-10-03T11:17:58Z)
VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition [3.4923338594757674]
Large language models (LLMs) can be used to train a model capable of extracting various types of entities. In this paper, we utilize the open-sourced LLM LLaMA2 as the backbone model, and design specific instructions to distinguish between different types of entities and datasets. Our model VANER, trained with a small partition of parameters, significantly outperforms previous LLMs-based models and, for the first time, as a model based on LLM, surpasses the majority of conventional state-of-the-art BioNER systems.
arXiv Detail & Related papers (2024-04-27T09:00:39Z)
Training-Free Pretrained Model Merging [38.16269074353077]
We propose an innovative model merging framework, coined as merging under dual-space constraints (MuDSC) In order to enhance usability, we have also incorporated adaptations for group structure, including Multi-Head Attention and Group Normalization.
arXiv Detail & Related papers (2024-03-04T06:19:27Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
TIES-Merging: Resolving Interference When Merging Models [95.59265307318752]
Transfer learning can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. Model merging has emerged as a solution to combine multiple task-specific models into a single model without performing additional training. Existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models. We propose TIES-Merging, which introduces three novel steps when merging models: resetting parameters that only changed a small amount during fine-tuning, resolving sign conflicts, and merging only the parameters that are in alignment with the final agreed-upon sign.
arXiv Detail & Related papers (2023-06-02T17:31:32Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.