Related papers: DPPA: Pruning Method for Large Language Model to Model Merging

DPPA: Pruning Method for Large Language Model to Model Merging

URL: http://arxiv.org/abs/2403.02799v1
Date: Tue, 5 Mar 2024 09:12:49 GMT
Title: DPPA: Pruning Method for Large Language Model to Model Merging
Authors: Yaochen Zhu, Rui Xia, Jiajun Zhang
Abstract summary: We introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA) to tackle the challenge of merging complex fine-tuned models. We show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies. Our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging.
Score: 39.13317231533299
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model merging is to combine fine-tuned models derived from multiple domains, with the intent of enhancing the model's proficiency across various domains. The principal concern is the resolution of parameter conflicts. A substantial amount of existing research remedy this issue during the merging stage, with the latest study focusing on resolving this issue throughout the pruning stage. The DARE approach has exhibited promising outcomes when applied to a simplistic fine-tuned model. However, the efficacy of this method tends to wane when employed on complex fine-tuned models that show a significant parameter bias relative to the baseline model. In this paper, we introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA), devised to tackle the challenge of merging complex fine-tuned models. Initially, we introduce Dynamically Pruning (DP), an improved approach based on magnitude pruning, which aim is to enhance performance at higher pruning rates. Subsequently, we propose Dynamically Partition Amplification (DPA), a rescaling strategy, is designed to dynamically amplify parameter partitions in relation to their significance levels. The experimental results show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies that preserve up to 90% of parameters. Furthermore, our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging. We make our code on Github.

Related papers

Dynamic Fisher-weighted Model Merging via Bayesian Optimization [37.02810891820468]
Existing merging approaches typically involve scaling the parameters model-wise or integrating parameter importance parameter-wise. We unify these strategies into a more general merging framework, and introduce Dynamic Fisher-weighted Merging (DF-Merge) We show that DF-Merge outperforms strong baselines across models of different sizes and a variety of tasks.
arXiv Detail & Related papers (2025-04-26T18:31:14Z)
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation [17.39117429338763]
We propose CoPA-Merging, a training-free parameter efficient merging method with complementary parameter adaptation. We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certificate the outstanding performance and generalizability of our method.
arXiv Detail & Related papers (2025-02-24T13:52:05Z)
Parameter Competition Balancing for Model Merging [13.66727853299506]
PCB-Merging is a training-free technique that adjusts the coefficients of each parameter for effective model merging. PCB-Merging achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models.
arXiv Detail & Related papers (2024-10-03T11:17:58Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
Activated Parameter Locating via Causal Intervention for Model Merging [26.98015572633289]
Model merging combines multiple models into one model, achieving convincing generalization without the necessity of additional training. Existing models have demonstrated that dropping a portion of delta parameters can alleviate conflicts while maintaining performance. We propose an Activated Locating (APL) method that utilizes causal intervention to estimate importance, enabling more precise parameter drops and better conflict mitigation.
arXiv Detail & Related papers (2024-08-18T14:00:00Z)
Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. CMDPs serve as an important framework to model many real-world applications with time-varying environments. We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z)
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive. We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation. Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z)
Understanding Parameter Sharing in Transformers [53.75988363281843]
Previous work on Transformers has focused on sharing parameters in different layers, which can improve the performance of models with limited parameters by increasing model depth. We show that the success of this approach can be largely attributed to better convergence, with only a small part due to the increased model complexity. Experiments on 8 machine translation tasks show that our model achieves competitive performance with only half the model complexity of parameter sharing models.
arXiv Detail & Related papers (2023-06-15T10:48:59Z)
TIES-Merging: Resolving Interference When Merging Models [95.59265307318752]
Transfer learning can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. Model merging has emerged as a solution to combine multiple task-specific models into a single model without performing additional training. Existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models. We propose TIES-Merging, which introduces three novel steps when merging models: resetting parameters that only changed a small amount during fine-tuning, resolving sign conflicts, and merging only the parameters that are in alignment with the final agreed-upon sign.
arXiv Detail & Related papers (2023-06-02T17:31:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.