Related papers: Revisiting Weight Averaging for Model Merging

Revisiting Weight Averaging for Model Merging

URL: http://arxiv.org/abs/2412.12153v2
Date: Thu, 03 Apr 2025 11:46:20 GMT
Title: Revisiting Weight Averaging for Model Merging
Authors: Jiho Choi, Donggyun Kim, Chanhyuk Lee, Seunghoon Hong,
Abstract summary: Model merging aims to build a multi-task learner by combining the parameters of individually fine-tuned models without additional training.<n>Weight averaging implicitly induces task vectors centered around the weight averaging itself.<n>Applying a low-rank approximation to these centered task vectors significantly improves merging performance.
Score: 16.503826062785773
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Model merging aims to build a multi-task learner by combining the parameters of individually fine-tuned models without additional training. While a straightforward approach is to average model parameters across tasks, this often results in suboptimal performance due to interference among parameters across tasks. In this paper, we present intriguing results that weight averaging implicitly induces task vectors centered around the weight averaging itself and that applying a low-rank approximation to these centered task vectors significantly improves merging performance. Our analysis shows that centering the task vectors effectively reduces task interference and most of task-specific knowledge is concentrated in the top singular vectors. Our method demonstrates robust and scalable performance on vision benchmarks across varying numbers of tasks and model sizes. Furthermore, we observe that our approach is applicable to natural language processing tasks with competitive performance.

Related papers

Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning [6.110846759317336]
Large-scale deep learning models with a pretraining-finetuning paradigm have led to a surge of numerous task-specific models fine-tuned from a common pre-trained model. Research efforts have been made on merging these large models into a single multi-task model, particularly with simple arithmetic on parameters. Such merging methodology faces a central challenge: interference between model parameters fine-tuned on different tasks. We propose to fine-tune pre-trained models via sharpness-aware minimization.
arXiv Detail & Related papers (2025-04-20T15:57:12Z)
Efficient Model Editing with Task-Localized Sparse Fine-tuning [14.792099973449794]
We propose TaLoS which allows to build sparse task vectors with minimal interference without requiring explicit linearization. We find that pre-trained models contain a subset of parameters with consistently low gradient sensitivity across tasks. Our experiments prove that TaLoS improves training and inference efficiency while outperforming current methods in task addition and negation.
arXiv Detail & Related papers (2025-04-03T14:20:06Z)
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging [15.383220675351076]
Model merging has emerged as a promising approach for unifying independently fine-tuned models into an integrated framework. We propose AdaRank, a novel model merging framework that adaptively selects the most beneficial singular directions of task vectors to merge multiple models. AdaRank consistently achieves state-of-the-art performance with various backbones and number of tasks, reducing the performance gap between fine-tuned models to nearly 1%.
arXiv Detail & Related papers (2025-03-28T06:49:06Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches. We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data. We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z)
Parameter-Efficient Interventions for Enhanced Model Merging [0.7373617024876725]
Model merging combines knowledge from task-specific models into a unified multi-task model to avoid joint training on all task data. We propose IntervMerge, a novel approach to multi-task model merging that effectively mitigates representation bias across the model. We show that IntervMerge consistently outperforms the state-of-the-art approaches using fewer parameters.
arXiv Detail & Related papers (2024-12-22T13:58:12Z)
Multi-Task Model Merging via Adaptive Weight Disentanglement [69.7292615212444]
We introduce an Adaptive Weight Disentanglement method for model merging. We successfully extract redundant vectors, and after their subtraction, the task vectors retain robust performance.
arXiv Detail & Related papers (2024-11-27T20:08:55Z)
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP) MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved. We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z)
Hessian Aware Low-Rank Perturbation for Order-Robust Continual Learning [19.850893012601638]
Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. We propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning.
arXiv Detail & Related papers (2023-11-26T01:44:01Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models [80.23791222509644]
Inconsistent AI models are considered brittle and untrustworthy by human users. We find that state-of-the-art vision-language models suffer from a surprisingly high degree of inconsistent behavior across tasks. We propose a rank correlation-based auxiliary training objective, computed over large automatically created cross-task contrast sets.
arXiv Detail & Related papers (2023-03-28T16:57:12Z)
Editing Models with Task Arithmetic [69.97273155842966]
Changing how pre-trained models behave is a common practice when developing machine learning systems. We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task. We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition.
arXiv Detail & Related papers (2022-12-08T05:50:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.