Related papers: Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs

Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs

URL: http://arxiv.org/abs/2504.10902v1
Date: Tue, 15 Apr 2025 06:23:24 GMT
Title: Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs
Authors: Rui Dai, Sile Hu, Xu Shen, Yonggang Zhang, Xinmei Tian, Jieping Ye,
Abstract summary: Recent research indicates that models demonstrating linearity enhance the performance of task arithmetic.<n>We argue that this linearity already exists within the model's submodules.<n>We propose an innovative model merging strategy that independently merges these submodules.
Score: 51.09983600916971
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Task arithmetic is a straightforward yet highly effective strategy for model merging, enabling the resultant model to exhibit multi-task capabilities. Recent research indicates that models demonstrating linearity enhance the performance of task arithmetic. In contrast to existing methods that rely on the global linearization of the model, we argue that this linearity already exists within the model's submodules. In particular, we present a statistical analysis and show that submodules (e.g., layers, self-attentions, and MLPs) exhibit significantly higher linearity than the overall model. Based on these findings, we propose an innovative model merging strategy that independently merges these submodules. Especially, we derive a closed-form solution for optimal merging weights grounded in the linear properties of these submodules. Experimental results demonstrate that our method consistently outperforms the standard task arithmetic approach and other established baselines across different model scales and various tasks. This result highlights the benefits of leveraging the linearity of submodules and provides a new perspective for exploring solutions for effective and practical multi-task model merging.

Related papers

The Law of Multi-Model Collaboration: Scaling Limits of Model Ensembling for Large Language Models [54.51795784459866]
We propose a theoretical framework of performance scaling for multi-model collaboration.<n>We show that multi-model systems follow a power-law scaling with respect to the total parameter count.<n> ensembles of heterogeneous model families achieve better performance scaling than those formed within a single model family.
arXiv Detail & Related papers (2025-12-29T09:55:12Z)
Model Merging via Multi-Teacher Knowledge Distillation [11.543771846135021]
We introduce a novel flatness-aware PAC-Bayes generalization bound specifically for the model merging setting.<n>We frame model merging as multi-teacher knowledge distillation on scarce, unlabeled data.<n>We formally demonstrate that minimizing the student-teacher Kullback-Leibler divergence directly tightens the upper bound on the merged model's excess risk.
arXiv Detail & Related papers (2025-12-24T17:10:44Z)
A Systematic Study of Model Merging Techniques in Large Language Models [43.5967188676583]
Model merging combines multiple fine-tuned checkpoints into a single model without additional training.<n>We present a large-scale, systematic evaluation of six state-of-the-art merging methods.<n>Results show that the oldest and simplest method, Task Arithmetic, is the only approach that reliably yields performance gains on LLMs.
arXiv Detail & Related papers (2025-11-26T14:28:11Z)
Revisiting Model Interpolation for Efficient Reasoning [27.32667995137936]
We revisit the simplest merging method that interpolates two weights directly.<n>We observe that model follows a three-stage evolutionary paradigm with distinct behaviors on the reasoning trajectory.
arXiv Detail & Related papers (2025-10-13T03:30:01Z)
Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data [16.462869377794316]
Multi-task learning (MTL) concurrently trains a model on diverse task datasets to exploit common features.<n>Recent studies have dedicated efforts to merging multiple independent model parameters into a unified model for MTL.<n>We propose LwPTV (Layer-wise Pruning Task Vector) by building a saliency score, measuring the redundancy of parameters in task vectors.
arXiv Detail & Related papers (2025-06-10T11:34:23Z)
NAN: A Training-Free Solution to Coefficient Estimation in Model Merging [61.36020737229637]
We show that the optimal merging weights should scale with the amount of task-specific information encoded in each model.<n>We propose NAN, a simple yet effective method that estimates model merging coefficients via the inverse of parameter norm.<n>NAN is training-free, plug-and-play, and applicable to a wide range of merging strategies.
arXiv Detail & Related papers (2025-05-22T02:46:08Z)
LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach [0.0]
LEWIS (Layer Wise Sparsity) is a guided model-merging framework.<n>It guides existing merging methods by preserving essential layer-wise task-specific knowledge.<n>Experiments demonstrate the effectiveness of LEWIS with performance improvements of code instruction-following and math-solving models.
arXiv Detail & Related papers (2025-03-05T20:09:59Z)
Superpose Singular Features for Model Merging [29.728307343119894]
Superpose Features from Task Matrix (SFTM) is a novel approach that superposes features from individual task models into a merged model.<n>Our method consistently outperforms existing methods, achieving superior performance and enhanced out-of-distribution generalization.
arXiv Detail & Related papers (2025-02-15T07:05:55Z)
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging [111.8456671452411]
Multi-task learning (MTL) leverages a shared model to accomplish multiple tasks and facilitate knowledge transfer. We propose a Weight-Ensembling Mixture of Experts (WEMoE) method for multi-task model merging. We show that WEMoE and E-WEMoE outperform state-of-the-art (SOTA) model merging methods in terms of MTL performance, generalization, and robustness.
arXiv Detail & Related papers (2024-10-29T07:16:31Z)
Closed-form merging of parameter-efficient modules for Federated Continual Learning [9.940242741914748]
We introduce LoRM, an alternating optimization strategy that trains one LoRA matrix at a time.<n>We apply our proposed methodology to Federated Class-Incremental Learning (FCIL)<n>Our method demonstrates state-of-the-art performance across a range of FCIL scenarios.
arXiv Detail & Related papers (2024-10-23T15:30:13Z)
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline.<n>We benchmark existing scaling techniques, especially selective merging, and variants of mixture.<n>We then formulate an optimal strategy for the selection and aggregation of a heterogeneous model zoo.<n>Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters.
arXiv Detail & Related papers (2024-10-07T15:55:55Z)
Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models [42.17166746027585]
We introduce a bidirectional weighted graph-based framework to learn factorized attributes and their interrelations within complex data. Specifically, we propose a $beta$-VAE based module to extract factors as the initial nodes of the graph. By integrating these complementary modules, our model successfully achieves fine-grained, practical and unsupervised disentanglement.
arXiv Detail & Related papers (2024-07-26T15:32:21Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Representation Surgery for Multi-Task Model Merging [57.63643005215592]
Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization. Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training. By visualizing the representation distribution of existing model merging schemes, we find that the merged model often suffers from the dilemma of representation bias.
arXiv Detail & Related papers (2024-02-05T03:39:39Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Variational Model-based Policy Optimization [34.80171122943031]
Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL. We propose an objective function as a variational lower-bound of a log-likelihood of a log-likelihood to jointly learn and improve model and policy. Our experiments on a number of continuous control tasks show that despite being more complex, our model-based (E-step) algorithm, called emactoral model-based policy optimization (VMBPO), is more sample-efficient and
arXiv Detail & Related papers (2020-06-09T18:30:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.