SuperMerge: An Approach For Gradient-Based Model Merging
- URL: http://arxiv.org/abs/2412.10416v2
- Date: Fri, 14 Feb 2025 17:40:13 GMT
- Title: SuperMerge: An Approach For Gradient-Based Model Merging
- Authors: Haoyu Yang, Zheng Zhang, Saket Sathe,
- Abstract summary: Large language models, such as ChatGPT, Claude, or LLaMA, are gigantic, monolithic, and possess the superpower to simultaneously support thousands of tasks.
One challenge of using task-specific models is the incremental need for solving newer tasks after the model is already deployed for existing tasks.
We propose a model merging based approach called SUPERMERGE.
We experimentally demonstrate that SUPERMERGE outperforms existing model merging methods on common natural language processing and computer vision tasks.
- Score: 9.136320029568305
- License:
- Abstract: Large language models, such as ChatGPT, Claude, or LLaMA, are gigantic, monolithic, and possess the superpower to simultaneously support thousands of tasks. However, high-throughput applications often prefer smaller task-specific models because of their lower latency and cost. One challenge of using task-specific models is the incremental need for solving newer tasks after the model is already deployed for existing tasks. A straightforward solution requires fine-tuning the model again for both existing and new tasks, which is computationally expensive and time-consuming. To address this issue, we propose a model merging based approach called SUPERMERGE. SUPERMERGE is a gradient-based method to systematically merge several fine-tuned models trained on existing and new tasks. SUPERMERGE is designed to be lightweight and fast, and the merged model achieves similar performance to fully fine-tuned models on all tasks. Furthermore, we proposed a hierarchical model merging strategy to reduce the peak space requirement without sacrificing the performance of the merged model. We experimentally demonstrate that SUPERMERGE outperforms existing model merging methods on common natural language processing and computer vision tasks.
Related papers
- 1bit-Merging: Dynamic Quantized Merging for Large Language Models [20.19975755949984]
texttt1bit-Merging is a novel framework that integrates task-specific routing with 1-bit quantized task vectors to balance performance and storage efficiency.
We demonstrate that texttt1bit-Merging achieves comparable or superior performance to existing methods while significantly reducing storage requirements.
arXiv Detail & Related papers (2025-02-15T09:47:50Z) - Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.
We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.
Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - A Model Is Not Built By A Single Prompt: LLM-Based Domain Modeling With Question Decomposition [4.123601037699469]
In real-world domain modeling, engineers usually decompose complex tasks into easily solvable sub-tasks.
We propose an LLM-based domain modeling approach via question decomposition, similar to developer's modeling process.
Preliminary results show that our approach outperforms the single-prompt-based prompt.
arXiv Detail & Related papers (2024-10-13T14:28:04Z) - What Matters for Model Merging at Scale? [94.26607564817786]
Model merging aims to combine multiple expert models into a more capable single model.
Previous studies have primarily focused on merging a few small models.
This study systematically evaluates the utility of model merging at scale.
arXiv Detail & Related papers (2024-10-04T17:17:19Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Representation Surgery for Multi-Task Model Merging [57.63643005215592]
Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization.
Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training.
By visualizing the representation distribution of existing model merging schemes, we find that the merged model often suffers from the dilemma of representation bias.
arXiv Detail & Related papers (2024-02-05T03:39:39Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - BYOM: Building Your Own Multi-Task Model For Free [69.63765907216442]
BYOM-FFT is for merging fully finetuned models, while BYOM-LoRA is for LoRA-finetuned models.
Experiments on computer vision and natural language processing tasks show that the proposed BYOM methods outperform existing merging methods by a large margin.
arXiv Detail & Related papers (2023-10-03T08:39:33Z) - ZipIt! Merging Models from Different Tasks without Training [20.2479633507354]
"ZipIt!" is a general method for merging two arbitrary models of the same architecture.
We find that these two changes combined account for 20-60% improvement over prior work.
arXiv Detail & Related papers (2023-05-04T17:59:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.