BYOM: Building Your Own Multi-Task Model For Free
- URL: http://arxiv.org/abs/2310.01886v3
- Date: Sat, 3 Feb 2024 15:22:33 GMT
- Title: BYOM: Building Your Own Multi-Task Model For Free
- Authors: Weisen Jiang and Baijiong Lin and Han Shi and Yu Zhang and Zhenguo Li
and James T. Kwok
- Abstract summary: BYOM-FFT is for merging fully finetuned models, while BYOM-LoRA is for LoRA-finetuned models.
Experiments on computer vision and natural language processing tasks show that the proposed BYOM methods outperform existing merging methods by a large margin.
- Score: 69.63765907216442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, various merging methods have been proposed to build a multi-task
model from task-specific finetuned models without retraining. However, existing
methods suffer from a large performance deterioration compared to using
multiple task-specific models. In this paper, we propose to inject
task-specific knowledge into the merged model and design two
parameter-efficient approaches (BYOM-FFT and BYOM-LoRA) to Build Your Own
Multi-task model. BYOM-FFT is for merging fully finetuned models, while
BYOM-LoRA is for LoRA-finetuned models. Both methods are data-free and
computation-efficient. Extensive experiments on computer vision and natural
language processing tasks show that the proposed BYOM methods outperform
existing merging methods by a large margin. Moreover, BYOM-FFT is general and
can be integrated into existing merging methods to further boost performance.
Related papers
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging [111.8456671452411]
Multi-task learning (MTL) leverages a shared model to accomplish multiple tasks and facilitate knowledge transfer.
We propose a Weight-Ensembling Mixture of Experts (WEMoE) method for multi-task model merging.
We show that WEMoE and E-WEMoE outperform state-of-the-art (SOTA) model merging methods in terms of MTL performance, generalization, and robustness.
arXiv Detail & Related papers (2024-10-29T07:16:31Z) - MIRA: A Method of Federated MultI-Task Learning for LaRge LAnguage Models [29.655807841018497]
We introduce a method for fine-tuning Large Language Models (LLMs)
Our approach leverages the structure of each client's model and enables a learning scheme that considers other clients' tasks and data distribution.
Experimental results, with different datasets and models, demonstrate the proposed method's effectiveness.
arXiv Detail & Related papers (2024-10-20T22:24:40Z) - MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts [6.245113492272563]
Mixture of Dyadic Experts (MoDE) is a novel design for efficient multi-task adaptation.
Our design allows for more fine-grained mixing, thereby increasing the model's ability to jointly handle multiple tasks.
arXiv Detail & Related papers (2024-08-02T18:05:10Z) - MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic [6.46176287368784]
We propose textbfModel textbfExclusive textbfTask textbfArithmetic for merging textbfGPT-scale models.
Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.
arXiv Detail & Related papers (2024-06-17T10:12:45Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Representation Surgery for Multi-Task Model Merging [57.63643005215592]
Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization.
Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training.
By visualizing the representation distribution of existing model merging schemes, we find that the merged model often suffers from the dilemma of representation bias.
arXiv Detail & Related papers (2024-02-05T03:39:39Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.