Fugu-MT 論文翻訳(概要): SimMerge: Learning to Select Merge Operators from Similarity Signals

論文の概要: SimMerge: Learning to Select Merge Operators from Similarity Signals

arxiv url: http://arxiv.org/abs/2601.09473v1
Date: Wed, 14 Jan 2026 13:30:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-15 18:59:20.413811
Title: SimMerge: Learning to Select Merge Operators from Similarity Signals
Title（参考訳）: SimMerge: 類似信号からマージオペレータを選択することを学ぶ
Authors: Oliver Bolton, Aakanksha, Arash Ahmadian, Sara Hooker, Marzieh Fadaee, Beyza Ermis,
Abstract要約: モデルマージにより、複数の大規模言語モデル(LLM)を単一のモデルに統合し、パフォーマンスを維持できる。本稿では,モデル間のコストのかかるタスクに依存しない類似性信号を用いて,最適なマージを選択する,没入型予測マージ選択手法を提案する。その結果,チェックポイントカタログが大きく,評価予算が厳しい場合には,マージ方法の学習が,拡張性のあるモデル構成への実践的な経路であることが示唆された。
参考スコア（独自算出の注目度）: 32.157558993834414
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model merging enables multiple large language models (LLMs) to be combined into a single model while preserving performance. This makes it a valuable tool in LLM development, offering a competitive alternative to multi-task training. However, merging can be difficult at scale, as successful merging requires choosing the right merge operator, selecting the right models, and merging them in the right order. This often leads researchers to run expensive merge-and-evaluate searches to select the best merge. In this work, we provide an alternative by introducing \simmerge{}, \emph{a predictive merge-selection method} that selects the best merge using inexpensive, task-agnostic similarity signals between models. From a small set of unlabeled probes, we compute functional and structural features and use them to predict the performance of a given 2-way merge. Using these predictions, \simmerge{} selects the best merge operator, the subset of models to merge, and the merge order, eliminating the expensive merge-and-evaluate loop. We demonstrate that we surpass standard merge-operator performance on 2-way merges of 7B-parameter LLMs, and that \simmerge{} generalizes to multi-way merges and 111B-parameter LLM merges without retraining. Additionally, we present a bandit variant that supports adding new tasks, models, and operators on the fly. Our results suggest that learning how to merge is a practical route to scalable model composition when checkpoint catalogs are large and evaluation budgets are tight.
Abstract（参考訳）: モデルマージにより、複数の大規模言語モデル(LLM)を単一のモデルに統合し、パフォーマンスを維持できる。これにより、LLM開発において貴重なツールとなり、マルチタスクトレーニングの競争力のある代替手段を提供する。しかし、マージを成功させるには、適切なマージ演算子を選択し、適切なモデルを選択し、それらを正しい順序でマージする必要があるため、大規模なマージは困難である。しばしば研究者は、最高のマージを選択するために、高価なマージと評価を行う。本研究では,モデル間のタスク非依存の類似性信号を用いて最適なマージを選択する,‘simmerge{}, \emph{a predictive merge-selection}’を導入することで,代替手段を提供する。ラベル付けされていない少数のプローブから関数的特徴と構造的特徴を計算し、与えられた2方向マージの性能を予測する。これらの予測を用いて、 \simmerge{} は最良のマージ演算子、マージするモデルのサブセット、およびマージ順序を選択し、高価なマージ・アンド・評価ループを排除する。我々は、7Bパラメータの2方向マージにおける標準的なマージ演算性能を超越し、111Bパラメータのマージをリトレーニングすることなく、マルチウェイマージと111Bパラメータのマージに一般化することを実証した。さらに,新たなタスクやモデル,演算子の追加をサポートするバンディットのバリエーションも提示する。その結果,チェックポイントカタログが大きく,評価予算が厳しい場合には,マージ方法の学習が,拡張性のあるモデル構成への実践的な経路であることが示唆された。

論文の概要: SimMerge: Learning to Select Merge Operators from Similarity Signals

関連論文リスト