Revisiting Weight Averaging for Model Merging
- URL: http://arxiv.org/abs/2412.12153v1
- Date: Wed, 11 Dec 2024 06:29:20 GMT
- Title: Revisiting Weight Averaging for Model Merging
- Authors: Jiho Choi, Donggyun Kim, Chanhyuk Lee, Seunghoon Hong,
- Abstract summary: We present results that weight averaging implicitly induces task vectors centered around the weight averaging itself.
Applying a low-rank approximation to these centered task vectors significantly improves merging performance.
We evaluate our method on eight image classification tasks, demonstrating that it outperforms prior methods by a significant margin.
- Score: 16.503826062785773
- License:
- Abstract: Model merging aims to build a multi-task learner by combining the parameters of individually fine-tuned models without additional training. While a straightforward approach is to average model parameters across tasks, this often results in suboptimal performance due to interference among parameters across tasks. In this paper, we present intriguing results that weight averaging implicitly induces task vectors centered around the weight averaging itself and that applying a low-rank approximation to these centered task vectors significantly improves merging performance. Our analysis shows that centering the task vectors effectively separates core task-specific knowledge and nuisance noise within the fine-tuned parameters into the top and lower singular vectors, respectively, allowing us to reduce inter-task interference through its low-rank approximation. We evaluate our method on eight image classification tasks, demonstrating that it outperforms prior methods by a significant margin, narrowing the performance gap with traditional multi-task learning to within 1-3%
Related papers
- Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.
Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.
We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.
We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.
Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - Tint Your Models Task-wise for Improved Multi-task Model Merging [17.496018757317824]
We propose Model Tinting, a test-time approach that introduces a single task-specific layer for each task as trainable adjustments.
Our method jointly trains merging coefficients and task-specific layers, which effectively reduces task conflicts with minimal additional costs.
Our method achieves state-of-the-art performance across both computer vision and natural language processing tasks.
arXiv Detail & Related papers (2024-12-26T07:42:06Z) - Parameter-Efficient Interventions for Enhanced Model Merging [0.7373617024876725]
Model merging combines knowledge from task-specific models into a unified multi-task model to avoid joint training on all task data.
We propose IntervMerge, a novel approach to multi-task model merging that effectively mitigates representation bias across the model.
We show that IntervMerge consistently outperforms the state-of-the-art approaches using fewer parameters.
arXiv Detail & Related papers (2024-12-22T13:58:12Z) - Multi-Task Model Merging via Adaptive Weight Disentanglement [69.7292615212444]
We introduce an Adaptive Weight Disentanglement method for model merging.
We successfully extract redundant vectors, and after their subtraction, the task vectors retain robust performance.
arXiv Detail & Related papers (2024-11-27T20:08:55Z) - ATM: Improving Model Merging by Alternating Tuning and Merging [16.12778778313037]
We motivate the effectiveness of task vectors by linking them to multi-task gradients.
In a single-epoch scenario, task vectors are mathematically equivalent to the gradients obtained via gradient descent in a multi-task setting.
We show that task vectors perform optimally when equality is maintained, and their effectiveness is largely driven by the first epoch's gradient.
arXiv Detail & Related papers (2024-11-05T12:42:42Z) - MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)
MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.
We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - Localizing Task Information for Improved Model Merging and Compression [61.16012721460561]
We show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights.
We propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches.
arXiv Detail & Related papers (2024-05-13T14:54:37Z) - Hessian Aware Low-Rank Perturbation for Order-Robust Continual Learning [19.850893012601638]
Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones.
We propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning.
arXiv Detail & Related papers (2023-11-26T01:44:01Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.