Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning
- URL: http://arxiv.org/abs/2504.14662v1
- Date: Sun, 20 Apr 2025 15:57:12 GMT
- Title: Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning
- Authors: Yeoreum Lee, Jinwook Jung, Sungyong Baik,
- Abstract summary: Large-scale deep learning models with a pretraining-finetuning paradigm have led to a surge of numerous task-specific models fine-tuned from a common pre-trained model.<n>Research efforts have been made on merging these large models into a single multi-task model, particularly with simple arithmetic on parameters.<n>Such merging methodology faces a central challenge: interference between model parameters fine-tuned on different tasks.<n>We propose to fine-tune pre-trained models via sharpness-aware minimization.
- Score: 6.110846759317336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale deep learning models with a pretraining-finetuning paradigm have led to a surge of numerous task-specific models fine-tuned from a common pre-trained model. Recently, several research efforts have been made on merging these large models into a single multi-task model, particularly with simple arithmetic on parameters. Such merging methodology faces a central challenge: interference between model parameters fine-tuned on different tasks. Few recent works have focused on designing a new fine-tuning scheme that can lead to small parameter interference, however at the cost of the performance of each task-specific fine-tuned model and thereby limiting that of a merged model. To improve the performance of a merged model, we note that a fine-tuning scheme should aim for (1) smaller parameter interference and (2) better performance of each fine-tuned model on the corresponding task. In this work, we aim to design a new fine-tuning objective function to work towards these two goals. In the course of this process, we find such objective function to be strikingly similar to sharpness-aware minimization (SAM) objective function, which aims to achieve generalization by finding flat minima. Drawing upon our observation, we propose to fine-tune pre-trained models via sharpness-aware minimization. The experimental and theoretical results showcase the effectiveness and orthogonality of our proposed approach, improving performance upon various merging and fine-tuning methods. Our code is available at https://github.com/baiklab/SAFT-Merge.
Related papers
- Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation [17.39117429338763]
We propose CoPA-Merging, a training-free parameter efficient merging method with complementary parameter adaptation.<n>We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certificate the outstanding performance and generalizability of our method.
arXiv Detail & Related papers (2025-02-24T13:52:05Z) - Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.<n>We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.<n>Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - Parameter-Efficient Interventions for Enhanced Model Merging [0.7373617024876725]
Model merging combines knowledge from task-specific models into a unified multi-task model to avoid joint training on all task data.<n>We propose IntervMerge, a novel approach to multi-task model merging that effectively mitigates representation bias across the model.<n>We show that IntervMerge consistently outperforms the state-of-the-art approaches using fewer parameters.
arXiv Detail & Related papers (2024-12-22T13:58:12Z) - Parameter Competition Balancing for Model Merging [13.66727853299506]
PCB-Merging is a training-free technique that adjusts the coefficients of each parameter for effective model merging.
PCB-Merging achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models.
arXiv Detail & Related papers (2024-10-03T11:17:58Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Understanding Parameter Sharing in Transformers [53.75988363281843]
Previous work on Transformers has focused on sharing parameters in different layers, which can improve the performance of models with limited parameters by increasing model depth.
We show that the success of this approach can be largely attributed to better convergence, with only a small part due to the increased model complexity.
Experiments on 8 machine translation tasks show that our model achieves competitive performance with only half the model complexity of parameter sharing models.
arXiv Detail & Related papers (2023-06-15T10:48:59Z) - TIES-Merging: Resolving Interference When Merging Models [95.59265307318752]
Transfer learning can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency.
Model merging has emerged as a solution to combine multiple task-specific models into a single model without performing additional training.
Existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models.
We propose TIES-Merging, which introduces three novel steps when merging models: resetting parameters that only changed a small amount during fine-tuning, resolving sign conflicts, and merging only the parameters that are in alignment with the final agreed-upon sign.
arXiv Detail & Related papers (2023-06-02T17:31:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.