Fugu-MT 論文翻訳(概要): Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

論文の概要: Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

arxiv url: http://arxiv.org/abs/2603.09938v1
Date: Tue, 10 Mar 2026 17:31:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-11 15:25:24.504509
Title: Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions
Title（参考訳）: 大規模言語モデルにおけるモデルマージ:方法,応用,今後の方向性
Authors: Mingyang Song, Mao Zheng,
Abstract要約: 本調査は,モデル統合を進めるための構造的基盤を研究者や実践者に提供することを目的としている。まず、損失ランドスケープ形状、モード接続性、線形モード接続性仮説など、マージの理論的基盤を確立する。次に、アルゴリズムのランドスケープを体系的にレビューし、平均ウェイト、タスクベクトル算術、スパーシフィケーション強化手法、ミックス・オブ・エキスパートアーキテクチャ、進化的最適化アプローチについて述べる。
参考スコア（独自算出の注目度）: 19.95776080082138
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model merging has emerged as a transformative paradigm for combining the capabilities of multiple neural networks into a single unified model without additional training. With the rapid proliferation of fine-tuned large language models~(LLMs), merging techniques offer a computationally efficient alternative to ensembles and full retraining, enabling practitioners to compose specialized capabilities at minimal cost. This survey presents a comprehensive and structured examination of model merging in the LLM era through the \textbf{FUSE} taxonomy, a four-dimensional framework organized along \textbf{F}oundations, \textbf{U}nification Strategies, \textbf{S}cenarios, and \textbf{E}cosystem. We first establish the theoretical underpinnings of merging, including loss landscape geometry, mode connectivity, and the linear mode connectivity hypothesis. We then systematically review the algorithmic landscape, spanning weight averaging, task vector arithmetic, sparsification-enhanced methods, mixture-of-experts architectures, and evolutionary optimization approaches. For each method family, we analyze the core formulation, highlight representative works, and discuss practical trade-offs. We further examine downstream applications across multi-task learning, safety alignment, domain specialization, multilingual transfer, and federated learning. Finally, we survey the supporting ecosystem of open-source tools, community platforms, and evaluation benchmarks, and identify key open challenges including theoretical gaps, scalability barriers, and standardization needs. This survey aims to equip researchers and practitioners with a structured foundation for advancing model merging.
Abstract（参考訳）: モデルマージは、追加のトレーニングなしで複数のニューラルネットワークの能力を単一の統一モデルに組み合わせるための変換パラダイムとして登場した。微調整された大きな言語モデル~(LLM)の急速な普及に伴い、マージ技術は、アンサンブルやフルリトレーニングに代わる計算的に効率的な代替手段を提供し、実践者が最小限のコストで特殊能力を構築することを可能にする。本調査では, LLM時代のモデルマージの包括的かつ構造化された検討を, \textbf{FUSE}分類体系, \textbf{F}oundations, \textbf{U}nification Strategies, \textbf{S}cenarios, \textbf{E}cosystemに沿って構成された4次元フレームワークを通じて行った。まず、損失ランドスケープ形状、モード接続性、線形モード接続性仮説など、マージの理論的基盤を確立する。次に、アルゴリズムのランドスケープを体系的にレビューし、平均ウェイト、タスクベクトル算術、スパーシフィケーション強化手法、ミックス・オブ・エキスパートアーキテクチャ、進化的最適化アプローチについて述べる。各メソッドファミリに対して、コアの定式化を分析し、代表作をハイライトし、実践的なトレードオフについて議論する。さらに、マルチタスク学習、安全アライメント、ドメインの特殊化、多言語移動、フェデレーション学習といった下流の応用について検討する。最後に、オープンソースツール、コミュニティプラットフォーム、評価ベンチマークのサポートエコシステムを調査し、理論的ギャップ、スケーラビリティ障壁、標準化のニーズを含む主要なオープン課題を特定します。本調査は,モデル統合を進めるための構造的基盤を研究者や実践者に提供することを目的としている。

論文の概要: Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

関連論文リスト