Fugu-MT 論文翻訳(概要): Model Merging: Foundations and Algorithms

論文の概要: Model Merging: Foundations and Algorithms

arxiv url: http://arxiv.org/abs/2605.01580v1
Date: Sat, 02 May 2026 19:06:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:49.838552
Title: Model Merging: Foundations and Algorithms
Title（参考訳）: モデルマージ - 基礎とアルゴリズム
Authors: Donato Crisostomi,
Abstract要約: この論文はモデルマージを研究し、独立に訓練されたニューラルネットワークを直接重み空間で組み合わせる。 C$2$M$3$は、Frank-Wolfe最適化に基づくサイクル一貫性のマージアルゴリズムである。マルチタスク設定では、まずタスクベクトルを近似勾配として理論的に記述する。次に,TSV幾何を用いた入力適応型ルーティング手法であるMASSを提案し,推定時にタスク関連部分空間を選択する。
参考スコア（独自算出の注目度）: 4.528573838858818
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern deep learning usually treats models as separate artifacts: trained independently, specialized for particular purposes, and replaced when improved versions appear. This thesis studies model merging as an alternative paradigm: combining independently trained neural networks directly in weight space, with little or no optimization and without requiring access to the original training data. The thesis considers two main regimes. In the single-task setting, where models share an objective but differ in initialization, we introduce C$^2$M$^3$, a cycle-consistent merging algorithm based on Frank-Wolfe optimization. C$^2$M$^3$ aligns multiple networks into a shared, reference-free parameter space, making weight averaging meaningful without privileging any individual model. In the multi-task setting, where models are fine-tuned for different downstream tasks from a common pretrained initialization, we first develop a theoretical account of task vectors as approximate gradients. This explains both the effectiveness and the limitations of task arithmetic. Building on this view, we show that task vectors inherit the low-rank structure of gradients and introduce Task Singular Vectors (TSV), a decomposition that enables compression and interference reduction through TSV-Merge. We then present MASS, an input-adaptive routing method that uses TSV geometry to select task-relevant subspaces at inference time. Finally, we introduce MERGE$^3$, an evolutionary merging framework that uses Item Response Theory to reduce evaluation costs by up to 50$\times$ while preserving solution quality. Together, these contributions provide theoretical and algorithmic foundations for model merging, supporting a paradigm in which learned capabilities can be composed, reused, and extended across models.
Abstract（参考訳）: 現代のディープラーニングは通常、モデルを独立したアーティファクトとして扱う:独立して訓練され、特定の目的に特化され、改良されたバージョンが現れると置き換えられる。この論文は、モデルマージを代替パラダイムとして研究し、独立にトレーニングされたニューラルネットワークを直接重み空間で組み合わせ、ほとんど、あるいは全く最適化することなく、元のトレーニングデータにアクセスする。論文は2つの主要な体制を考察している。モデルが目的を共有しながら初期化が異なるシングルタスク設定では、フランク=ウルフ最適化に基づくサイクル一貫性のマージアルゴリズムであるC$^2$M$^3$を導入する。 C$^2$M$^3$は、複数のネットワークを共有参照のないパラメータ空間に整列させ、個々のモデルを犠牲にすることなく平均的な重み付けを意味付ける。マルチタスク設定では、共通の事前訓練された初期化から異なる下流タスクに対してモデルを微調整し、まずタスクベクトルを近似勾配として理論的に記述する。これは、タスク演算の有効性と限界の両方を説明する。この観点から、タスクベクトルは勾配の低ランク構造を継承し、TSV(Task Singular Vectors)を導入し、TSV-Mergeによる圧縮と干渉の低減を可能にする。次に,TSV幾何を用いた入力適応型ルーティング手法であるMASSを提案し,推定時にタスク関連部分空間を選択する。最後に、MERGE$^3$という、アイテム応答理論を用いて、ソリューションの品質を維持しながら、評価コストを最大50$\times$に削減する進化的マージフレームワークを紹介します。これらのコントリビューションは、モデルマージのための理論的およびアルゴリズム的な基礎を提供し、学習能力の合成、再利用、モデル間の拡張といったパラダイムをサポートする。

論文の概要: Model Merging: Foundations and Algorithms

関連論文リスト