Related papers: 1bit-Merging: Dynamic Quantized Merging for Large Language Models

1bit-Merging: Dynamic Quantized Merging for Large Language Models

URL: http://arxiv.org/abs/2502.10743v1
Date: Sat, 15 Feb 2025 09:47:50 GMT
Title: 1bit-Merging: Dynamic Quantized Merging for Large Language Models
Authors: Shuqi Liu, Han Wu, Bowei He, Zehua Liu, Xiongwei Han, Mingxuan Yuan, Linqi Song,
Abstract summary: texttt1bit-Merging is a novel framework that integrates task-specific routing with 1-bit quantized task vectors to balance performance and storage efficiency.<n>We demonstrate that texttt1bit-Merging achieves comparable or superior performance to existing methods while significantly reducing storage requirements.
Score: 20.19975755949984
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in large language models have led to specialized models excelling in specific domains, creating a need for efficient model merging techniques. While traditional merging approaches combine parameters into a single static model, they often compromise task-specific performance. However, task-specific routing methods maintain accuracy but introduce substantial storage overhead. We present \texttt{1bit}-Merging, a novel framework that integrates task-specific routing with 1-bit quantized task vectors to balance performance and storage efficiency. Our approach leverages the observation that different task-specific models store knowledge in distinct layers-chat models primarily in attention layers and math/code models in MLP layers-enabling targeted compression strategies. Through extensive experiments with LLaMA2 and Mistral model families across chat, mathematical reasoning, and code generation tasks, we demonstrate that \texttt{1bit}-Merging achieves comparable or superior performance to existing methods while significantly reducing storage requirements. Our framework offers a practical solution for combining specialized models while maintaining their individual strengths and addressing the storage challenges of current approaches.

Related papers

Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks. By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z)
LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach [0.0]
LEWIS (Layer Wise Sparsity) is a guided model-merging framework. It guides existing merging methods by preserving essential layer-wise task-specific knowledge. Experiments demonstrate the effectiveness of LEWIS with performance improvements of code instruction-following and math-solving models.
arXiv Detail & Related papers (2025-03-05T20:09:59Z)
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces [17.69597528370121]
Model merging integrates the weights of multiple task-specific models into a single multi-task model. Despite recent interest in the problem, a significant performance gap between the combined and single-task models remains. We show that alignment between singular components of task-specific and merged matrices strongly correlates with performance improvement.
arXiv Detail & Related papers (2025-02-07T14:22:56Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.<n>We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.<n>Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z)
SuperMerge: An Approach For Gradient-Based Model Merging [9.136320029568305]
Large language models, such as ChatGPT, Claude, or LLaMA, are gigantic, monolithic, and possess the superpower to simultaneously support thousands of tasks.<n>One challenge of using task-specific models is the incremental need for solving newer tasks after the model is already deployed for existing tasks.<n>We propose a model merging based approach called SUPERMERGE.<n>We experimentally demonstrate that SUPERMERGE outperforms existing model merging methods on common natural language processing and computer vision tasks.
arXiv Detail & Related papers (2024-12-09T20:03:14Z)
MoD: A Distribution-Based Approach for Merging Large Language Models [0.0]
Large language models (LLMs) have enabled the development of numerous specialized, task-specific variants. We propose the textitMixture of Distributions (MoD) framework, a novel approach for merging LLMs. Unlike traditional weight-averaging methods, MoD effectively preserves the specialized capabilities of individual models.
arXiv Detail & Related papers (2024-11-01T07:05:29Z)
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic [22.73746175315071]
We introduce Localize-and-Stitch, a novel approach that merges models in a localized way.<n>We demonstrate that our approach effectively locates sparse regions responsible for finetuned performance.<n>Our algorithm also facilitates model compression and preserves pretrained knowledge.
arXiv Detail & Related papers (2024-08-24T19:14:02Z)
On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion [23.63688816017186]
Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge. We propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task. Our method closes the performance gap by 96.4% in single-task scenarios and by 86.3% in multi-task scenarios.
arXiv Detail & Related papers (2024-06-17T03:07:41Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently. Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable. We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z)
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks. We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.