Fugu-MT 論文翻訳(概要): Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation

論文の概要: Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation

arxiv url: http://arxiv.org/abs/2603.10048v1
Date: Mon, 09 Mar 2026 02:11:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.591358
Title: Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation
Title（参考訳）: シャープネスを意識した最小化の再考: より忠実で効果的な実装
Authors: Jianlong Chen, Zhiming Zhou,
Abstract要約: シャープネス・アウェアの最小化(SAM)は、パラメータ周辺の最大トレーニング損失を最小化することにより、一般化を促進する。しかし、その実践的な実装は、勾配の上昇(s) と近似し、その後、現在のパラメータを更新するために上昇点の勾配を適用する。その結果,一段上昇点における勾配は,局所勾配よりも局所勾配よりも現在のパラメータから局所近傍の最大方向への方向の近似が優れていることがわかった。
参考スコア（独自算出の注目度）: 2.4147146608927597
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sharpness-Aware Minimization (SAM) enhances generalization by minimizing the maximum training loss within a predefined neighborhood around the parameters. However, its practical implementation approximates this as gradient ascent(s) followed by applying the gradient at the ascent point to update the current parameters. This practice can be justified as approximately optimizing the objective by neglecting the (full) derivative of the ascent point with respect to the current parameters. Nevertheless, a direct and intuitive understanding of why using the gradient at the ascent point to update the current parameters works superiorly is still lacking. Our work bridges this gap by proposing a novel and intuitive interpretation. We show that the gradient at the single-step ascent point, \uline{when applied to the current parameters}, provides a better approximation of the direction from the current parameters toward the maximum within the local neighborhood than the local gradient. This improved approximation thereby enables a more direct escape from the maximum within the local neighborhood. Nevertheless, our analysis further reveals two issues. First, the approximation by the gradient at the single-step ascent point is often inaccurate. Second, the approximation quality may degrade as the number of ascent steps increases. To address these limitations, we propose in this paper eXplicit Sharpness-Aware Minimization (XSAM). It tackles the first by explicitly estimating the direction of the maximum during training, while addressing the second by crafting a search space that effectively leverages the gradient information at the multi-step ascent point. XSAM features a unified formulation that applies to both single-step and multi-step settings and only incurs negligible computational overhead. Extensive experiments demonstrate the consistent superiority of XSAM against existing counterparts.
Abstract（参考訳）: シャープネス・アウェアの最小化(SAM)は、パラメータの周辺で定義された領域内での最大トレーニング損失を最小化することにより、一般化を促進する。しかし、その実践的な実装は、勾配上昇(s) と近似し、その後、現在のパラメータを更新するために上昇点の勾配を適用する。このプラクティスは、現在のパラメータに対する上昇点の(完全な)微分を無視することによって、目的をほぼ最適化するものとして正当化することができる。しかし、なぜ現在のパラメータを更新するのに勾配を使うのかという直感的で直感的な理解はいまだに欠けている。私たちの仕事は、新しい直感的な解釈を提案して、このギャップを埋めます。単段上昇点における勾配は、局所勾配よりも局所勾配よりも、現在のパラメータから局所近傍の最大方向への方向の近似が優れていることを示す。この改良された近似により、局所的な地区内での最大値から、より直接的なエスケープが可能になる。しかしながら、我々の分析はさらに2つの問題を明らかにしている。第一に、一段階の上昇点における勾配による近似は、しばしば不正確なものである。第二に、上昇段数が増加するにつれて近似品質が低下することがある。これらの制約に対処するため,本稿ではeXplicit Sharpness-Aware Minimization (XSAM)を提案する。トレーニング中に最大値の方向を明示的に推定し、マルチステップの上昇点における勾配情報を有効に活用する探索空間を構築することで、第1の課題に取り組む。 XSAMはシングルステップ設定とマルチステップ設定の両方に適用できる統一的な定式化を備えており、無視できる計算オーバーヘッドしか発生しない。大規模な実験は、既存のものとのXSAMの相反する優位性を実証している。

論文の概要: Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation

関連論文リスト