Related papers: Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic

Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic

URL: http://arxiv.org/abs/2408.13656v1
Date: Sat, 24 Aug 2024 19:14:02 GMT
Title: Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
Authors: Yifei He, Yuzheng Hu, Yong Lin, Tong Zhang, Han Zhao,
Abstract summary: We introduce Localize-and-Stitch, a novel approach that merges models in a localized way. We demonstrate that our approach effectively locates sparse regions responsible for finetuned performance. Our algorithm also facilitates model compression and preserves pretrained knowledge.
Score: 22.73746175315071
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model merging offers an effective strategy to combine the strengths of multiple finetuned models into a unified model that preserves the specialized capabilities of each. Existing methods merge models in a global manner, performing arithmetic operations across all model parameters. However, such global merging often leads to task interference, degrading the performance of the merged model. In this work, we introduce Localize-and-Stitch, a novel approach that merges models in a localized way. Our algorithm works in two steps: i) Localization: identify tiny ($1\%$ of the total parameters) localized regions in the finetuned models containing essential skills for the downstream tasks, and ii) Stitching: reintegrate only these essential regions back into the pretrained model for task synergy. We demonstrate that our approach effectively locates sparse regions responsible for finetuned performance, and the localized regions could be treated as compact and interpretable representations of the finetuned models (tasks). Empirically, we evaluate our method on various vision and language benchmarks, showing that it outperforms existing model merging methods under different data availability scenarios. Beyond strong empirical performance, our algorithm also facilitates model compression and preserves pretrained knowledge, enabling flexible and continual skill composition from multiple finetuned models with minimal storage and computational overhead. Our code is available at https://github.com/yifei-he/Localize-and-Stitch.

Related papers

GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching [41.96482857947199]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.<n>LLMs typically come with a substantial model size, which presents significant challenges in deployment and inference.<n>We develop a novel strategy to compress models by strategically combining or merging layers from finetuned model variants.
arXiv Detail & Related papers (2025-06-25T14:24:59Z)
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging [103.98582374569789]
Model merging aims to combine multiple expert models into a single model, thereby reducing storage and serving costs.<n>Previous studies have primarily focused on merging visual classification models or Large Language Models (LLMs) for code and math tasks.<n>We introduce the model merging benchmark for MLLMs, which includes multiple tasks such as VQA, Geometry, Chart, OCR, and Grounding, providing both LoRA and full fine-tuning models.
arXiv Detail & Related papers (2025-05-26T12:23:14Z)
MergeBench: A Benchmark for Merging Domain-Specialized LLMs [19.49737955489798]
MergeBench is an evaluation suite designed to assess model merging at scale.<n>It builds on state-of-the-art open-source language models, including Llama and Gemma families at 2B to 9B scales.<n>We assess eight representative merging methods across multi-task performance, forgetting and runtime efficiency.
arXiv Detail & Related papers (2025-05-16T04:02:55Z)
Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks. By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z)
1bit-Merging: Dynamic Quantized Merging for Large Language Models [20.19975755949984]
texttt1bit-Merging is a novel framework that integrates task-specific routing with 1-bit quantized task vectors to balance performance and storage efficiency. We demonstrate that texttt1bit-Merging achieves comparable or superior performance to existing methods while significantly reducing storage requirements.
arXiv Detail & Related papers (2025-02-15T09:47:50Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains. Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches. We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data. We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z)
The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse [25.002218722102505]
Model merging aims to efficiently combine the weights of multiple expert models, each trained on a specific task, into a single multi-task model. This work explores the more challenging scenario of "non-local" merging. Standard merging techniques often fail to generalize effectively in this non-local setting. We propose a multi-task technique to re-scale and shift the output activations of the merged model for each task, aligning its output statistics with those of the corresponding task-specific expert models.
arXiv Detail & Related papers (2024-10-16T17:41:59Z)
Exploring Model Kinship for Merging Large Language Models [52.01652098827454]
We introduce model kinship, the degree of similarity or relatedness between Large Language Models. We find that there is a certain relationship between model kinship and the performance gains after model merging. We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can yield better performance on benchmark datasets.
arXiv Detail & Related papers (2024-10-16T14:29:29Z)
Parameter Competition Balancing for Model Merging [13.66727853299506]
PCB-Merging is a training-free technique that adjusts the coefficients of each parameter for effective model merging. PCB-Merging achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models.
arXiv Detail & Related papers (2024-10-03T11:17:58Z)
PLeaS -- Merging Models with Permutations and Least Squares [43.17620198572947]
We propose a new two-step algorithm to merge models-termed PLeaS. PLeaS partially matches nodes in each layer by maximizing alignment. It computes the weights of the merged model as a layer-wise Least Squares solution.
arXiv Detail & Related papers (2024-07-02T17:24:04Z)
FusionBench: A Comprehensive Benchmark of Deep Model Fusion [78.80920533793595]
Deep model fusion is a technique that unifies the predictions or parameters of several deep neural networks into a single model. FusionBench is the first comprehensive benchmark dedicated to deep model fusion.
arXiv Detail & Related papers (2024-06-05T13:54:28Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Training-Free Pretrained Model Merging [38.16269074353077]
We propose an innovative model merging framework, coined as merging under dual-space constraints (MuDSC) In order to enhance usability, we have also incorporated adaptations for group structure, including Multi-Head Attention and Group Normalization.
arXiv Detail & Related papers (2024-03-04T06:19:27Z)
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks. We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Pairwise Similarity Knowledge Transfer for Weakly Supervised Object Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels. In this work, we argue that learning only an objectness function is a weak form of knowledge transfer. Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.