DPPA: Pruning Method for Large Language Model to Model Merging
- URL: http://arxiv.org/abs/2403.02799v1
- Date: Tue, 5 Mar 2024 09:12:49 GMT
- Title: DPPA: Pruning Method for Large Language Model to Model Merging
- Authors: Yaochen Zhu, Rui Xia, Jiajun Zhang
- Abstract summary: We introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA) to tackle the challenge of merging complex fine-tuned models.
We show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies.
Our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging.
- Score: 39.13317231533299
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model merging is to combine fine-tuned models derived from multiple domains,
with the intent of enhancing the model's proficiency across various domains.
The principal concern is the resolution of parameter conflicts. A substantial
amount of existing research remedy this issue during the merging stage, with
the latest study focusing on resolving this issue throughout the pruning stage.
The DARE approach has exhibited promising outcomes when applied to a simplistic
fine-tuned model. However, the efficacy of this method tends to wane when
employed on complex fine-tuned models that show a significant parameter bias
relative to the baseline model. In this paper, we introduce a dual-stage method
termed Dynamic Pruning Partition Amplification (DPPA), devised to tackle the
challenge of merging complex fine-tuned models. Initially, we introduce
Dynamically Pruning (DP), an improved approach based on magnitude pruning,
which aim is to enhance performance at higher pruning rates. Subsequently, we
propose Dynamically Partition Amplification (DPA), a rescaling strategy, is
designed to dynamically amplify parameter partitions in relation to their
significance levels. The experimental results show that our method maintains a
mere 20% of domain-specific parameters and yet delivers a performance
comparable to other methodologies that preserve up to 90% of parameters.
Furthermore, our method displays outstanding performance post-pruning, leading
to a significant improvement of nearly 20% performance in model merging. We
make our code on Github.
Related papers
- Parameter Competition Balancing for Model Merging [13.66727853299506]
PCB-Merging is a training-free technique that adjusts the coefficients of each parameter for effective model merging.
PCB-Merging achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models.
arXiv Detail & Related papers (2024-10-03T11:17:58Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Activated Parameter Locating via Causal Intervention for Model Merging [26.98015572633289]
Model merging combines multiple models into one model, achieving convincing generalization without the necessity of additional training.
Existing models have demonstrated that dropping a portion of delta parameters can alleviate conflicts while maintaining performance.
We propose an Activated Locating (APL) method that utilizes causal intervention to estimate importance, enabling more precise parameter drops and better conflict mitigation.
arXiv Detail & Related papers (2024-08-18T14:00:00Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Understanding Parameter Sharing in Transformers [53.75988363281843]
Previous work on Transformers has focused on sharing parameters in different layers, which can improve the performance of models with limited parameters by increasing model depth.
We show that the success of this approach can be largely attributed to better convergence, with only a small part due to the increased model complexity.
Experiments on 8 machine translation tasks show that our model achieves competitive performance with only half the model complexity of parameter sharing models.
arXiv Detail & Related papers (2023-06-15T10:48:59Z) - TIES-Merging: Resolving Interference When Merging Models [95.59265307318752]
Transfer learning can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency.
Model merging has emerged as a solution to combine multiple task-specific models into a single model without performing additional training.
Existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models.
We propose TIES-Merging, which introduces three novel steps when merging models: resetting parameters that only changed a small amount during fine-tuning, resolving sign conflicts, and merging only the parameters that are in alignment with the final agreed-upon sign.
arXiv Detail & Related papers (2023-06-02T17:31:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.