Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
- URL: http://arxiv.org/abs/2502.17159v1
- Date: Mon, 24 Feb 2025 13:52:05 GMT
- Title: Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
- Authors: Fanhu Zeng, Haiyang Guo, Fei Zhu, Li Shen, Hao Tang,
- Abstract summary: We propose CoPA-Merging, a training-free parameter efficient merging method with complementary parameter adaptation.<n>We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certificate the outstanding performance and generalizability of our method.
- Score: 17.39117429338763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning pre-trained models with custom data leads to numerous expert models on specific tasks. Merging models into one universal model to empower multi-task ability refraining from data leakage has gained popularity. With the expansion in data and model size, parameter efficient tuning becomes the common practice for obtaining task-specific models efficiently. However, we observe that existing methods designed for full fine-tuning merging fail under efficient tuning. To address the issues, we analyze from low-rank decomposition and reveal that maintaining direction and compensating for gap between singular values are crucial for efficient model merging. Consequently, we propose CoPA-Merging, a training-free parameter efficient merging method with complementary parameter adaptation. Specifically, we (1) prune parameters and construct scaling coefficients from inter-parameter relation to compensate for performance drop from task interference and (2) perform cross-task normalization to enhance unseen task generalization. We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certificate the outstanding performance and generalizability of our method. Additional study and extensive analyses further showcase the effectiveness.
Related papers
- Dynamic Fisher-weighted Model Merging via Bayesian Optimization [37.02810891820468]
Existing merging approaches typically involve scaling the parameters model-wise or integrating parameter importance parameter-wise.
We unify these strategies into a more general merging framework, and introduce Dynamic Fisher-weighted Merging (DF-Merge)
We show that DF-Merge outperforms strong baselines across models of different sizes and a variety of tasks.
arXiv Detail & Related papers (2025-04-26T18:31:14Z) - Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning [6.110846759317336]
Large-scale deep learning models with a pretraining-finetuning paradigm have led to a surge of numerous task-specific models fine-tuned from a common pre-trained model.
Research efforts have been made on merging these large models into a single multi-task model, particularly with simple arithmetic on parameters.
Such merging methodology faces a central challenge: interference between model parameters fine-tuned on different tasks.
We propose to fine-tune pre-trained models via sharpness-aware minimization.
arXiv Detail & Related papers (2025-04-20T15:57:12Z) - Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models [20.741460682103863]
Sens-Merging is a sensitivity-guided coefficient adjustment method for model merging.<n>We show that Sens-Merging significantly improves performance across general knowledge, mathematical reasoning, and code generation tasks.<n>Our findings reveal important trade-offs between task-specific and cross-task scalings, providing insights for future model merging strategies.
arXiv Detail & Related papers (2025-02-18T01:41:13Z) - Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - Optimize Incompatible Parameters through Compatibility-aware Knowledge Integration [104.52015641099828]
Existing research excels in removing such parameters or merging the outputs of multiple different pretrained models.<n>We propose Compatibility-aware Knowledge Integration (CKI), which consists of Deep Assessment and Deep Splicing.<n>The integrated model can be used directly for inference or for further fine-tuning.
arXiv Detail & Related papers (2025-01-10T01:42:43Z) - Parameter-Efficient Interventions for Enhanced Model Merging [0.7373617024876725]
Model merging combines knowledge from task-specific models into a unified multi-task model to avoid joint training on all task data.<n>We propose IntervMerge, a novel approach to multi-task model merging that effectively mitigates representation bias across the model.<n>We show that IntervMerge consistently outperforms the state-of-the-art approaches using fewer parameters.
arXiv Detail & Related papers (2024-12-22T13:58:12Z) - Parameter Competition Balancing for Model Merging [13.66727853299506]
PCB-Merging is a training-free technique that adjusts the coefficients of each parameter for effective model merging.
PCB-Merging achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models.
arXiv Detail & Related papers (2024-10-03T11:17:58Z) - Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently.
Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable.
We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z) - Parameter Efficient Multi-task Model Fusion with Partial Linearization [97.23530944186078]
We propose a novel method to improve multi-task fusion for parameter-efficient fine-tuning techniques.
Our approach partially linearizes only the adapter modules and applies task arithmetic over the linearized adapters.
We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model.
arXiv Detail & Related papers (2023-10-07T08:55:54Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - TIES-Merging: Resolving Interference When Merging Models [95.59265307318752]
Transfer learning can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency.
Model merging has emerged as a solution to combine multiple task-specific models into a single model without performing additional training.
Existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models.
We propose TIES-Merging, which introduces three novel steps when merging models: resetting parameters that only changed a small amount during fine-tuning, resolving sign conflicts, and merging only the parameters that are in alignment with the final agreed-upon sign.
arXiv Detail & Related papers (2023-06-02T17:31:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.