Related papers: Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent

Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent

URL: http://arxiv.org/abs/2501.01230v2
Date: Sat, 11 Jan 2025 07:52:40 GMT
Title: Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent
Authors: Yongxian Wei, Anke Tang, Li Shen, Chun Yuan, Xiaochun Cao,
Abstract summary: Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.<n>We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.<n>Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
Score: 74.02034188307857
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data. Existing methods attempt to alleviate task conflicts by sparsifying task vectors or promoting orthogonality among them. However, they overlook the fundamental requirement of model merging: ensuring the merged model performs comparably to task-specific models on respective tasks. We find these methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Based on our findings, we frame model merging as a constrained optimization problem ($\textit{i.e.}$, minimizing the gap between the merged model and individual models, subject to the constraint of retaining shared knowledge) and solve it via adaptive projective gradient descent. Specifically, we align the merged model with individual models by decomposing and reconstituting the loss function, alleviating conflicts through $\textit{data-free}$ optimization of task vectors. To retain shared knowledge, we optimize this objective by projecting gradients within a $\textit{shared subspace}$ spanning all tasks. Moreover, we view merging coefficients as adaptive learning rates and propose a task-aware, training-free strategy. Experiments show that our plug-and-play approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.

Related papers

Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration [16.667053306761364]
Multi-task model merging aims to consolidate knowledge from multiple task-specific experts into a unified model.<n>Existing methods approach this by minimizing differences between task-specific experts and the unified model.<n>We propose Layer-wise Optimal Task Vector Merging, a technique that explicitly minimizes feature drift between task-specific experts and the unified model.
arXiv Detail & Related papers (2025-05-29T08:11:31Z)
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging [15.383220675351076]
Model merging has emerged as a promising approach for unifying independently fine-tuned models into an integrated framework.<n>We propose AdaRank, a novel model merging framework that adaptively selects the most beneficial singular directions of task vectors to merge multiple models.<n>AdaRank consistently achieves state-of-the-art performance with various backbones and number of tasks, reducing the performance gap between fine-tuned models to nearly 1%.
arXiv Detail & Related papers (2025-03-28T06:49:06Z)
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces [17.69597528370121]
Model merging integrates the weights of multiple task-specific models into a single multi-task model. Despite recent interest in the problem, a significant performance gap between the combined and single-task models remains. We show that alignment between singular components of task-specific and merged matrices strongly correlates with performance improvement.
arXiv Detail & Related papers (2025-02-07T14:22:56Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches. We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging [33.23758947497205]
Advanced embedding models are typically developed using large-scale multi-task data and joint training across multiple tasks. To overcome these challenges, we explore model merging-a technique that combines independently trained models to mitigate gradient conflicts and balance data distribution. We introduce a novel method, Self Positioning, which efficiently searches for optimal model combinations within the space of task vectors using gradient descent.
arXiv Detail & Related papers (2024-10-19T08:39:21Z)
Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks [3.776249047528669]
We leverage the abundance of freely trained models to introduce a cost-free approach to model merging. It aims to maintain the distinctiveness of the task-specific final layers while unifying the initial layers. This approach ensures parameter consistency across all layers, essential for boosting performance.
arXiv Detail & Related papers (2024-09-24T07:19:30Z)
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging [21.918559935122786]
Model merging is a promising way to combine multiple task-specific models into a single multitask model without extra training. Traditional model merging methods often show significant performance gaps compared to fine-tuned models. We show that both shared and exclusive task-specific knowledge are crucial for merging performance, but directly merging exclusive knowledge hinders overall performance. We propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input.
arXiv Detail & Related papers (2024-06-17T02:31:55Z)
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)<n>MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.<n>We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z)
Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models [83.02797560769285]
Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data.<n>Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts.
arXiv Detail & Related papers (2024-05-26T13:11:55Z)
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks. We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL) Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks. As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.