Fugu-MT 論文翻訳(概要): Bayesian Model Merging

論文の概要: Bayesian Model Merging

arxiv url: http://arxiv.org/abs/2605.12843v1
Date: Wed, 13 May 2026 00:36:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:27.738467
Title: Bayesian Model Merging
Title（参考訳）: ベイジアンモデルマージ
Authors: Kaiyang Li, Shaobo Han, Qing Su, Shihao Ji,
Abstract要約: モデルマージは、複数のタスク固有のエキスパートモデルを、共同トレーニングなしで単一のモデルに結合することを目的としている。本稿では,プラグインとプレイの双方向最適化フレームワークであるBayesian Model Merging (BMM)を紹介する。 BMMは、すべてのプラグアンドプレイアンカーベースラインを一貫して上回る。
参考スコア（独自算出の注目度）: 17.887004278413915
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model merging aims to combine multiple task-specific expert models into a single model without joint retraining, offering a practical alternative to multi-task learning when data access or computational budget is limited. Existing methods, however, face two key limitations: (1) they overlook the valuable inductive bias of strong anchor models and estimate the merged weights from scratch, and (2) they rely on a shared hyperparameter setting across different modules of the network, lacking a global optimization strategy. This paper introduces Bayesian Model Merging (BMM), a plug-and-play bi-level optimization framework, where the inner level formulates the model merging as an activation-based Bayesian regression under a strong prior induced by an anchor model, yielding an efficient closed-form solution; and the outer level leverages a Bayesian optimization procedure to search module-specific hyperparameters globally based on a small validation set. Furthermore, we reveal a key alignment between activation statistics and task vectors, enabling us to derive a data-free variant of BMM that estimates the Gram matrix for regression without any auxiliary data. Across extensive benchmarks, including up to 20-task merging in vision and 5-task merging in language, BMM consistently outperforms all plug-and-play anchor baselines (e.g., TA, WUDI-Merging, and TSV). In particular, on the ViT-L/14 benchmark for 8-task merging, a single merged model reaches 95.1, closely matching the average performance of eight task-specific experts (95.8).
Abstract（参考訳）: モデルマージは、データアクセスや計算予算が限られている場合に、マルチタスク学習の実用的な代替手段を提供する。しかし、既存の手法では、(1)強アンカーモデルの価値ある帰納バイアスを見落とし、融合重量をスクラッチから推定し、(2)ネットワークの異なるモジュール間での共有ハイパーパラメータ設定に依存し、グローバルな最適化戦略が欠如している。内部レベルは、アクティベーションベースのベイズ回帰として、アンカーモデルによって強く誘導され、効率的なクローズド・フォーム・ソリューションが得られ、外部レベルは、モジュール固有のハイパーパラメータを世界規模で探索するためにベイズ最適化手法を利用する。さらに,アクティベーション統計量とタスクベクトルとの間の重要な整合性を明らかにし,補助データなしで回帰のグラム行列を推定するBMMのデータフリーな変種を導出することを可能にする。最大20タスクのマージ、言語での5タスクマージを含む広範囲なベンチマークを含む、BMMは、プラグインとプレイのアンカーベースライン(例えば、TA、WUDI-Merging、TSV)を一貫して上回っている。特に8タスクマージのViT-L/14ベンチマークでは、単一のマージモデルが95.1に達し、8つのタスク固有の専門家(95.8)の平均的なパフォーマンスと密接に一致している。

論文の概要: Bayesian Model Merging

関連論文リスト