Fugu-MT 論文翻訳(概要): How does the optimizer implicitly bias the model merging loss landscape?

論文の概要: How does the optimizer implicitly bias the model merging loss landscape?

arxiv url: http://arxiv.org/abs/2510.04686v1
Date: Mon, 06 Oct 2025 10:56:41 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.812239
Title: How does the optimizer implicitly bias the model merging loss landscape?
Title（参考訳）: オープティマイザは、モデルとマージした損失の風景を暗黙的にバイアスしますか?
Authors: Chenxiang Zhang, Alexander Theus, Damien Teney, Antonio Orvieto, Jun Pang, Sjouke Mauw,
Abstract要約: 一つの量 -- 効果的なノイズスケール -- が、モデルマージにおける推論とデータ選択の影響を統一することを示します。データセット全体にわたって、マージ成功の有効性は、有効雑音の非単調関数であり、明確な最適値である。
参考スコア（独自算出の注目度）: 66.96572894292895
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model merging methods combine models with different capabilities into a single one while maintaining the same inference cost. Two popular approaches are linear interpolation, which linearly interpolates between model weights, and task arithmetic, which combines task vectors obtained by the difference between finetuned and base models. While useful in practice, what properties make merging effective are poorly understood. This paper explores how the optimization process affects the loss landscape geometry and its impact on merging success. We show that a single quantity -- the effective noise scale -- unifies the impact of optimizer and data choices on model merging. Across architectures and datasets, the effectiveness of merging success is a non-monotonic function of effective noise, with a distinct optimum. Decomposing this quantity, we find that larger learning rates, stronger weight decay, smaller batch sizes, and data augmentation all independently modulate the effective noise scale, exhibiting the same qualitative trend. Unlike prior work that connects optimizer noise to the flatness or generalization of individual minima, we show that it also affects the global loss landscape, predicting when independently trained solutions can be merged. Our findings broaden the understanding of how optimization shapes the loss landscape geometry and its downstream consequences for model merging, suggesting the possibility of further manipulating the training dynamics to improve merging effectiveness.
Abstract（参考訳）: モデルマージ手法は同じ推論コストを維持しながら、異なる機能を持つモデルをひとつのモデルに統合する。 2つの一般的なアプローチは、モデルの重みを線形に補間する線形補間と、微調整されたモデルとベースモデルの違いによって得られるタスクベクトルを組み合わせたタスク算術である。実際には有用であるが、マージを効果的に行う性質は理解されていない。本稿では、最適化プロセスが損失景観の幾何学的構造にどのように影響するか、およびマージ成功に与える影響について考察する。一つの量 -- 効果的なノイズスケール -- が、最適化器とデータ選択がモデルマージに与える影響を統一することを示します。アーキテクチャやデータセット全体にわたって、マージ成功の有効性は、有効雑音の非単調関数であり、明確な最適値である。この量を分解すると、より大きな学習率、より強いウェイト崩壊、より小さなバッチサイズ、データ拡張がそれぞれ独立して有効雑音尺度を変調し、同じ定性的傾向を示すことが分かる。個別のミニマムの平坦性や一般化にオプティマイザノイズを接続する以前の作業とは異なり、独立に訓練されたソリューションがマージ可能であることを予測して、グローバルなロスランドスケープにも影響を及ぼすことを示す。本研究は, モデルマージにおける損失地形形状の最適化とその下流結果の理解を深め, マージ効率を向上させるためのトレーニング力学のさらなる操作の可能性を示した。

論文の概要: How does the optimizer implicitly bias the model merging loss landscape?

関連論文リスト