Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
- URL: http://arxiv.org/abs/2312.06795v1
- Date: Mon, 11 Dec 2023 19:10:55 GMT
- Title: Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
- Authors: MohammadReza Davari and Eugene Belilovsky
- Abstract summary: A common approach for targeted problems involves fine-tuning pre-trained foundation models for specific target tasks.
We introduce a new simple method, Model Breadcrumbs, which consists of a sparsely defined set of weights that carve out a trajectory within the weight space of a pre-trained model.
Our experiments demonstrate the effectiveness of Model Breadcrumbs to simultaneously improve performance across multiple tasks.
- Score: 14.349517221831364
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The rapid development of AI systems has been greatly influenced by the
emergence of foundation models. A common approach for targeted problems
involves fine-tuning these pre-trained foundation models for specific target
tasks, resulting in a rapid spread of models fine-tuned across a diverse array
of tasks. This work focuses on the problem of merging multiple fine-tunings of
the same foundation model derived from a spectrum of auxiliary tasks. We
introduce a new simple method, Model Breadcrumbs, which consists of a sparsely
defined set of weights that carve out a trajectory within the weight space of a
pre-trained model, enhancing task performance when traversed. These breadcrumbs
are constructed by subtracting the weights from a pre-trained model before and
after fine-tuning, followed by a sparsification process that eliminates weight
outliers and negligible perturbations. Our experiments demonstrate the
effectiveness of Model Breadcrumbs to simultaneously improve performance across
multiple tasks. This contribution aligns with the evolving paradigm of
updatable machine learning, reminiscent of the collaborative principles
underlying open-source software development, fostering a community-driven
effort to reliably update machine learning models. Our method is shown to be
more efficient and unlike previous proposals does not require hyperparameter
tuning for each new task added. Through extensive experimentation involving
various models, tasks, and modalities we establish that integrating Model
Breadcrumbs offers a simple, efficient, and highly effective approach for
constructing multi-task models and facilitating updates to foundation models.
Related papers
- The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse [25.002218722102505]
Model merging aims to efficiently combine the weights of multiple expert models, each trained on a specific task, into a single multi-task model.
This work explores the more challenging scenario of "non-local" merging.
Standard merging techniques often fail to generalize effectively in this non-local setting.
We propose a multi-task technique to re-scale and shift the output activations of the merged model for each task, aligning its output statistics with those of the corresponding task-specific expert models.
arXiv Detail & Related papers (2024-10-16T17:41:59Z) - HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models [28.993221775758702]
Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability.
This paper marks a significant advance toward more flexible and comprehensive model merging techniques.
We train policy and value networks using offline sampling of weight vectors, which are then employed for the online optimization of merging strategies.
arXiv Detail & Related papers (2024-09-27T16:31:31Z) - Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management [35.06717005729781]
Recent foundation models are capable of handling multiple machine learning (ML) tasks and multiple data modalities with the unified base model structure and several specialized model components.
Development of such multi-task (MT) multi-modal (MM) models poses significant model management challenges to existing training systems.
We build a prototype system and evaluate it on various large MT MM models.
Experiments demonstrate the superior performance and efficiency of our system, with speedup ratio up to 71% compared to state-of-the-art training systems.
arXiv Detail & Related papers (2024-09-05T09:10:40Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Fisher Mask Nodes for Language Model Merging [0.0]
We introduce a novel model merging method for Transformers, combining insights from previous work in Fisher-weighted averaging and the use of Fisher information in model pruning.
Our method exhibits a regular and significant performance increase across various models in the BERT family, outperforming full-scale Fisher-weighted averaging in a fraction of the computational cost.
arXiv Detail & Related papers (2024-03-14T21:52:26Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model
Reuse [59.500060790983994]
This paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse, utilizing the PyTorch backend.
ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, encompassing target architecture construction with PTM, tuning target model with PTM, and PTM-based inference.
arXiv Detail & Related papers (2023-08-17T19:12:13Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Model Ratatouille: Recycling Diverse Models for Out-of-Distribution
Generalization [99.6826401545377]
Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions.
We propose model ratatouille, a new strategy to recycle the multiple fine-tunings of the same foundation model on diverse auxiliary tasks.
arXiv Detail & Related papers (2022-12-20T17:21:46Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.