Related papers: Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

URL: http://arxiv.org/abs/2408.07666v4
Date: Thu, 5 Sep 2024 14:37:59 GMT
Title: Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
Authors: Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, Dacheng Tao,
Abstract summary: Model merging is an efficient empowerment technique in the machine learning community. There is a significant gap in the literature regarding a systematic and thorough review of these techniques.
Score: 89.40778301238642
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. This survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and 10+ machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions. A comprehensive list of papers about model merging is available at \url{https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications}.

Related papers

AutoMerge: Search-Based Model Merging Framework for Effective Model Reuse [8.950520457150178]
Recently, model merging has emerged in the domain of large language models (LLMs) as a training-free approach.<n>No prior work has systematically investigated whether such an approach can be effectively applied to other deep learning models.<n>We present the first systematic study that evaluates five model merging techniques on three distinct model architectures.
arXiv Detail & Related papers (2026-01-30T09:27:01Z)
Discrete Diffusion in Large Language and Multimodal Models: A Survey [61.86669998363359]
We provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs)<n>Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy.<n>We trace the historical development of dLLMs and dMLLMs, formalize the underlying mathematical frameworks, list commonly-used modeling methods, and categorize representative models.
arXiv Detail & Related papers (2025-06-16T17:59:08Z)
A Comprehensive Survey on Continual Learning in Generative Models [35.76314482046672]
We present a comprehensive survey of continual learning methods for mainstream generative models.<n>We categorize these approaches into three paradigms: architecture-based, regularization-based, and replay-based.<n>We analyze continual learning setups for different generative models, including training objectives, benchmarks, and core backbones.
arXiv Detail & Related papers (2025-06-16T02:27:25Z)
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging [103.98582374569789]
Model merging aims to combine multiple expert models into a single model, thereby reducing storage and serving costs.<n>Previous studies have primarily focused on merging visual classification models or Large Language Models (LLMs) for code and math tasks.<n>We introduce the model merging benchmark for MLLMs, which includes multiple tasks such as VQA, Geometry, Chart, OCR, and Grounding, providing both LoRA and full fine-tuning models.
arXiv Detail & Related papers (2025-05-26T12:23:14Z)
MergeBench: A Benchmark for Merging Domain-Specialized LLMs [19.49737955489798]
MergeBench is an evaluation suite designed to assess model merging at scale.<n>It builds on state-of-the-art open-source language models, including Llama and Gemma families at 2B to 9B scales.<n>We assess eight representative merging methods across multi-task performance, forgetting and runtime efficiency.
arXiv Detail & Related papers (2025-05-16T04:02:55Z)
From Task-Specific Models to Unified Systems: A Review of Model Merging Approaches [13.778158813149833]
This paper establishes a new taxonomy of model merging methods, systematically comparing different approaches and providing an overview of key developments. Despite the rapid progress in this field, a comprehensive taxonomy and survey summarizing recent advances and predicting future directions are still lacking.
arXiv Detail & Related papers (2025-03-12T02:17:31Z)
EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models. We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness. Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z)
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models [28.993221775758702]
Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability. This paper marks a significant advance toward more flexible and comprehensive model merging techniques. We train policy and value networks using offline sampling of weight vectors, which are then employed for the online optimization of merging strategies.
arXiv Detail & Related papers (2024-09-27T16:31:31Z)
LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch. Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process. By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z)
Learning from models beyond fine-tuning [78.20895343699658]
Learn From Model (LFM) focuses on the research, modification, and design of foundation models (FM) based on the model interface. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing. This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM.
arXiv Detail & Related papers (2023-10-12T10:20:36Z)
Deep Model Fusion: A Survey [37.39100741978586]
Deep model fusion/merging is an emerging technique that merges the parameters or predictions of multiple deep learning models into a single one. It faces several challenges, including high computational cost, high-dimensional parameter space, interference between different heterogeneous models, etc.
arXiv Detail & Related papers (2023-09-27T14:40:12Z)
MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning [53.90744622542961]
Reasoning in mathematical domains remains a significant challenge for small language models (LMs) We introduce a new method that exploits existing mathematical problem datasets with diverse annotation styles. Experimental results show that our strategy enables a LLaMA-7B model to outperform prior approaches.
arXiv Detail & Related papers (2023-07-16T05:41:53Z)
Multi-View Class Incremental Learning [57.14644913531313]
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance. This paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views.
arXiv Detail & Related papers (2023-06-16T08:13:41Z)
Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost. Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Deep Model-Based Reinforcement Learning for High-Dimensional Problems, a Survey [1.2031796234206134]
Model-based reinforcement learning creates an explicit model of the environment dynamics to reduce the need for environment samples. A challenge for deep model-based methods is to achieve high predictive power while maintaining low sample complexity. We propose a taxonomy based on three approaches: using explicit planning on given transitions, using explicit planning on learned transitions, and end-to-end learning of both planning and transitions.
arXiv Detail & Related papers (2020-08-11T08:49:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.