Related papers: SuperMerge: An Approach For Gradient-Based Model Merging

SuperMerge: An Approach For Gradient-Based Model Merging

URL: http://arxiv.org/abs/2412.10416v2
Date: Fri, 14 Feb 2025 17:40:13 GMT
Title: SuperMerge: An Approach For Gradient-Based Model Merging
Authors: Haoyu Yang, Zheng Zhang, Saket Sathe,
Abstract summary: Large language models, such as ChatGPT, Claude, or LLaMA, are gigantic, monolithic, and possess the superpower to simultaneously support thousands of tasks.<n>One challenge of using task-specific models is the incremental need for solving newer tasks after the model is already deployed for existing tasks.<n>We propose a model merging based approach called SUPERMERGE.<n>We experimentally demonstrate that SUPERMERGE outperforms existing model merging methods on common natural language processing and computer vision tasks.
Score: 9.136320029568305
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models, such as ChatGPT, Claude, or LLaMA, are gigantic, monolithic, and possess the superpower to simultaneously support thousands of tasks. However, high-throughput applications often prefer smaller task-specific models because of their lower latency and cost. One challenge of using task-specific models is the incremental need for solving newer tasks after the model is already deployed for existing tasks. A straightforward solution requires fine-tuning the model again for both existing and new tasks, which is computationally expensive and time-consuming. To address this issue, we propose a model merging based approach called SUPERMERGE. SUPERMERGE is a gradient-based method to systematically merge several fine-tuned models trained on existing and new tasks. SUPERMERGE is designed to be lightweight and fast, and the merged model achieves similar performance to fully fine-tuned models on all tasks. Furthermore, we proposed a hierarchical model merging strategy to reduce the peak space requirement without sacrificing the performance of the merged model. We experimentally demonstrate that SUPERMERGE outperforms existing model merging methods on common natural language processing and computer vision tasks.

Related papers

1bit-Merging: Dynamic Quantized Merging for Large Language Models [20.19975755949984]
texttt1bit-Merging is a novel framework that integrates task-specific routing with 1-bit quantized task vectors to balance performance and storage efficiency. We demonstrate that texttt1bit-Merging achieves comparable or superior performance to existing methods while significantly reducing storage requirements.
arXiv Detail & Related papers (2025-02-15T09:47:50Z)
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data. We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z)
Why Train Everything? Tint a Single Layer for Multi-task Model Merging [17.496018757317824]
Model merging integrates independently fine-tuned models into a single multi-task model, offering a flexible alternative to joint training. Many existing model merging methods introduce additional task-specific components, increasing complexity and requiring extra modifications. We propose Model Tinting, a lightweight yet highly effective approach that improves model merging by updating just a single layer.
arXiv Detail & Related papers (2024-12-26T07:42:06Z)
A Model Is Not Built By A Single Prompt: LLM-Based Domain Modeling With Question Decomposition [4.123601037699469]
In real-world domain modeling, engineers usually decompose complex tasks into easily solvable sub-tasks. We propose an LLM-based domain modeling approach via question decomposition, similar to developer's modeling process. Preliminary results show that our approach outperforms the single-prompt-based prompt.
arXiv Detail & Related papers (2024-10-13T14:28:04Z)
What Matters for Model Merging at Scale? [94.26607564817786]
Model merging aims to combine multiple expert models into a more capable single model. Previous studies have primarily focused on merging a few small models. This study systematically evaluates the utility of model merging at scale.
arXiv Detail & Related papers (2024-10-04T17:17:19Z)
PLeaS -- Merging Models with Permutations and Least Squares [43.17620198572947]
We propose a new two-step algorithm to merge models -- termed PLeaS -- which relaxes constraints. PLeaS partially matches nodes in each layer by maximizing alignment. We also demonstrate how our method can be extended to address a challenging scenario where no data is available from the fine-tuning domains.
arXiv Detail & Related papers (2024-07-02T17:24:04Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Arcee's MergeKit: A Toolkit for Merging Large Language Models [0.6374098147778188]
MergeKit is a framework to efficiently merge models on any hardware. To date, thousands of models have been merged by the open-source community.
arXiv Detail & Related papers (2024-03-20T02:38:01Z)
Representation Surgery for Multi-Task Model Merging [57.63643005215592]
Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization. Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training. By visualizing the representation distribution of existing model merging schemes, we find that the merged model often suffers from the dilemma of representation bias.
arXiv Detail & Related papers (2024-02-05T03:39:39Z)
AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
BYOM: Building Your Own Multi-Task Model For Free [69.63765907216442]
BYOM-FFT is for merging fully finetuned models, while BYOM-LoRA is for LoRA-finetuned models. Experiments on computer vision and natural language processing tasks show that the proposed BYOM methods outperform existing merging methods by a large margin.
arXiv Detail & Related papers (2023-10-03T08:39:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.