Related papers: Merging by Matching Models in Task Parameter Subspaces

Merging by Matching Models in Task Parameter Subspaces

URL: http://arxiv.org/abs/2312.04339v2
Date: Sat, 13 Apr 2024 16:31:25 GMT
Title: Merging by Matching Models in Task Parameter Subspaces
Authors: Derek Tam, Mohit Bansal, Colin Raffel,
Abstract summary: Model merging aims to cheaply combine individual task-specific models into a single multitask model. We formalize how this approach to model merging can be seen as solving a linear system of equations. We show that using the conjugate gradient method can outperform closed-form solutions.
Score: 87.8712523378141
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model merging aims to cheaply combine individual task-specific models into a single multitask model. In this work, we view past merging methods as leveraging different notions of a ''task parameter subspace'' in which models are matched before being merged. We connect the task parameter subspace of a given model to its loss landscape and formalize how this approach to model merging can be seen as solving a linear system of equations. While past work has generally been limited to linear systems that have a closed-form solution, we consider using the conjugate gradient method to find a solution. We show that using the conjugate gradient method can outperform closed-form solutions, enables merging via linear systems that are otherwise intractable to solve, and flexibly allows choosing from a wide variety of initializations and estimates for the ''task parameter subspace''. We ultimately demonstrate that our merging framework called ''Matching Models in their Task Parameter Subspace'' (MaTS) achieves state-of-the-art results in multitask and intermediate-task model merging. We release all of the code and checkpoints used in our work at https://github.com/r-three/mats.

Related papers

Model Assembly Learning with Heterogeneous Layer Weight Merging [57.8462476398611]
We introduce Model Assembly Learning (MAL), a novel paradigm for model merging. MAL integrates parameters from diverse models in an open-ended model zoo to enhance the base model's capabilities.
arXiv Detail & Related papers (2025-03-27T16:21:53Z)
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces [17.69597528370121]
Model merging integrates the weights of multiple task-specific models into a single multi-task model. Despite recent interest in the problem, a significant performance gap between the combined and single-task models remains. We show that alignment between singular components of task-specific and merged matrices strongly correlates with performance improvement.
arXiv Detail & Related papers (2025-02-07T14:22:56Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches. We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data. We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z)
Parameter Competition Balancing for Model Merging [13.66727853299506]
PCB-Merging is a training-free technique that adjusts the coefficients of each parameter for effective model merging. PCB-Merging achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models.
arXiv Detail & Related papers (2024-10-03T11:17:58Z)
PLeaS -- Merging Models with Permutations and Least Squares [43.17620198572947]
We propose a new two-step algorithm to merge models-termed PLeaS. PLeaS partially matches nodes in each layer by maximizing alignment. It computes the weights of the merged model as a layer-wise Least Squares solution.
arXiv Detail & Related papers (2024-07-02T17:24:04Z)
Training-Free Pretrained Model Merging [38.16269074353077]
We propose an innovative model merging framework, coined as merging under dual-space constraints (MuDSC) In order to enhance usability, we have also incorporated adaptations for group structure, including Multi-Head Attention and Group Normalization.
arXiv Detail & Related papers (2024-03-04T06:19:27Z)
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently. Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable. We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z)
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks. We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z)
Parameter Efficient Multi-task Model Fusion with Partial Linearization [97.23530944186078]
We propose a novel method to improve multi-task fusion for parameter-efficient fine-tuning techniques. Our approach partially linearizes only the adapter modules and applies task arithmetic over the linearized adapters. We demonstrate that our partial linearization technique enables a more effective fusion of multiple tasks into a single model.
arXiv Detail & Related papers (2023-10-07T08:55:54Z)
TIES-Merging: Resolving Interference When Merging Models [95.59265307318752]
Transfer learning can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. Model merging has emerged as a solution to combine multiple task-specific models into a single model without performing additional training. Existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models. We propose TIES-Merging, which introduces three novel steps when merging models: resetting parameters that only changed a small amount during fine-tuning, resolving sign conflicts, and merging only the parameters that are in alignment with the final agreed-upon sign.
arXiv Detail & Related papers (2023-06-02T17:31:32Z)
Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC) SFSC generates a series of compatible sub-models with different capacities through one training process. SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.