LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach
- URL: http://arxiv.org/abs/2503.03874v2
- Date: Fri, 07 Mar 2025 22:25:17 GMT
- Title: LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach
- Authors: Hetarth Chopra, Vidhi Rambhia, Vikram Adve,
- Abstract summary: LEWIS (Layer Wise Sparsity) is a guided model-merging framework.<n>It guides existing merging methods by preserving essential layer-wise task-specific knowledge.<n>Experiments demonstrate the effectiveness of LEWIS with performance improvements of code instruction-following and math-solving models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As specialized large language models (LLMs) become increasingly prevalent, model merging methods are being used to combine them to create a single multi-task model without requiring any additional data or training. However, these approaches fall short when the objective of merging is to increase the downstream model's performance on a particular task-specific benchmark. In this work, we propose LEWIS (Layer Wise Sparsity), a guided model-merging framework that uses activation-based layer importance to dynamically adjust layer-wise task-vector sparsity required for the merge process. LEWIS uses a calibration dataset to prioritize critical layers during the task-vector pruning process required for model merging. This approach guides existing merging methods by preserving essential layer-wise task-specific knowledge while ensuring the merged model performs the best at benchmarks resembling the calibration dataset. Our experiments demonstrate the effectiveness of LEWIS with performance improvements of code instruction-following and math-solving models created through model merging up to 4 percent and 11.3 percent, respectively, outperforming unguided data-less model merging approaches that use uniform-sparsity.
Related papers
- Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs [51.09983600916971]
Recent research indicates that models demonstrating linearity enhance the performance of task arithmetic.
We argue that this linearity already exists within the model's submodules.
We propose an innovative model merging strategy that independently merges these submodules.
arXiv Detail & Related papers (2025-04-15T06:23:24Z) - Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks.
By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z) - 1bit-Merging: Dynamic Quantized Merging for Large Language Models [20.19975755949984]
texttt1bit-Merging is a novel framework that integrates task-specific routing with 1-bit quantized task vectors to balance performance and storage efficiency.
We demonstrate that texttt1bit-Merging achieves comparable or superior performance to existing methods while significantly reducing storage requirements.
arXiv Detail & Related papers (2025-02-15T09:47:50Z) - No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces [17.69597528370121]
Model merging integrates the weights of multiple task-specific models into a single multi-task model.<n>Despite recent interest in the problem, a significant performance gap between the combined and single-task models remains.<n>We show that alignment between singular components of task-specific and merged matrices strongly correlates with performance improvement.
arXiv Detail & Related papers (2025-02-07T14:22:56Z) - Activation-Informed Merging of Large Language Models [10.020512818972357]
This paper introduces Activation-Informed Merging (AIM), a technique that integrates the information from the activation space of large language models into the merging process.
We empirically demonstrate that AIM significantly enhances the performance of merged models across multiple benchmarks.
arXiv Detail & Related papers (2025-02-04T15:42:03Z) - Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.<n>We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.<n>Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.