Fugu-MT 論文翻訳(概要): Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization

論文の概要: Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization

arxiv url: http://arxiv.org/abs/2603.08290v1
Date: Mon, 09 Mar 2026 12:09:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:15.904268
Title: Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization
Title（参考訳）: 第一に、最後のメジャー: シャープネスを意識した最小化の奥行きによるインシシデントバイアス
Authors: Chaewon Moon, Dongkuk Si, Chulhee Yun,
Abstract要約: 線形分離可能な二元分類に基づく$L$層線形対角ネットワークのトレーニングにおいて,シャープネス・アウェア最小化(SAM)の暗黙バイアスについて検討した。 $ell_infty$-SAM の場合、極限方向は $mathbf0$ あるいは任意の標準ベクトルに収束する。我々の理論解析は、この現象を正規化に応用した$ell$-SAMの勾配正規化因子とみなす。
参考スコア（独自算出の注目度）: 24.4931530458436
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the implicit bias of Sharpness-Aware Minimization (SAM) when training $L$-layer linear diagonal networks on linearly separable binary classification. For linear models ($L=1$), both $\ell_\infty$- and $\ell_2$-SAM recover the $\ell_2$ max-margin classifier, matching gradient descent (GD). However, for depth $L = 2$, the behavior changes drastically -- even on a single-example dataset. For $\ell_\infty$-SAM, the limit direction depends critically on initialization and can converge to $\mathbf{0}$ or to any standard basis vector, in stark contrast to GD, whose limit aligns with the basis vector of the dominant data coordinate. For $\ell_2$-SAM, we show that although its limit direction matches the $\ell_1$ max-margin solution as in the case of GD, its finite-time dynamics exhibit a phenomenon we call "sequential feature amplification", in which the predictor initially relies on minor coordinates and gradually shifts to larger ones as training proceeds or initialization increases. Our theoretical analysis attributes this phenomenon to $\ell_2$-SAM's gradient normalization factor applied in its perturbation, which amplifies minor coordinates early and allows major ones to dominate later, giving a concrete example where infinite-time implicit-bias analyses are insufficient. Synthetic and real-data experiments corroborate our findings.
Abstract（参考訳）: 線形分離可能な二元分類に基づく$L$層線形対角ネットワークのトレーニングにおいて,シャープネス・アウェア最小化(SAM)の暗黙バイアスについて検討した。線形モデル (L=1$) の場合、$\ell_\infty$- と $\ell_2$-SAM の両方が $\ell_2$ max-margin 分類器を復元し、勾配降下(GD)と一致する。しかし、deep $L = 2$の場合、単一のサンプルデータセットであっても、動作は大きく変化する。 $\ell_\infty$-SAM の場合、極限方向は初期化に大きく依存し、支配的なデータ座標の基底ベクトルと一致する GD とは対照的に、$\mathbf{0}$ あるいは任意の標準基底ベクトルに収束することができる。 $\ell_2$-SAM の場合、その極限方向は GD の場合のように $\ell_1$ max-margin 解と一致するが、その有限時間力学は「逐次的特徴増幅」と呼ばれる現象を示す。我々の理論解析は、この現象を摂動に応用した$\ell_2$-SAMの勾配正規化係数に当てはめ、これは小さな座標を早く増幅し、主要な座標を後から支配し、無限時間暗黙バイアス分析が不十分な具体的な例を与える。合成および実データ実験は、我々の発見を裏付けるものである。

関連論文リスト

Regularized Online RLHF with Generalized Bilinear Preferences [68.44113000390544]
一般的な嗜好を伴う文脈的オンラインRLHFの問題を考える。一般化された双線形選好モデルを用いて、低ランクなスキュー対称行列による選好を捉える。グリーディポリシーの双対ギャップは推定誤差の正方形によって有界であることを示す。
論文参考訳（メタデータ） (2026-02-26T15:27:53Z)
Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias [0.0]
パラメータノルムの族をスケールするために、統一的で高確率な特徴を与える。次に、降下によって訓練された線形ネットワークについて研究する。
論文参考訳（メタデータ） (2025-09-25T13:59:22Z)
Linear regression with overparameterized linear neural networks: Tight upper and lower bounds for implicit $\ell^1$-regularization [3.4540258577108776]
過パラメータ線形回帰問題に対する深度Dge 2$の対角線形ニューラルネットワークの暗黙正則化について検討した。 D ge 3$ の場合、誤差は$alpha$ で線形的に減少するが、$D=2$ の場合、$alpha1-varrho$ で減少する。数値実験は、我々の理論的な発見を裏付け、より深いネットワーク、すなわち$D ge 3$がより良い一般化をもたらすことを示唆している。
論文参考訳（メタデータ） (2025-06-01T19:55:31Z)
Emergence and scaling laws in SGD learning of shallow neural networks [64.48316762675141]
等方性ガウスデータに基づいてP$ニューロンを持つ2層ニューラルネットワークを学習するためのオンライン勾配降下(SGD)の複雑さについて検討した。平均二乗誤差(MSE)を最小化するために,学生2層ネットワークのトレーニングのためのSGDダイナミックスを高精度に解析する。
論文参考訳（メタデータ） (2025-04-28T16:58:55Z)
Complexity of Vector-valued Prediction: From Linear Models to Stochastic Convex Optimization [27.33243506775655]
凸とリプシッツ損失関数の基本的な場合に焦点を当てる。本稿では,この問題の複雑さと関連する学習モデルとの関連性に光を当てた,いくつかの新たな理論的結果を示す。結果は,ベクトル値線形予測の設定を,広範に研究されている2つの異なる学習モデル間のブリッジングとして表現した。
論文参考訳（メタデータ） (2024-12-05T15:56:54Z)
Convergence Rate Analysis of LION [54.28350823319057]
LION は、勾配カルシュ=クーン=T (sqrtdK-)$で測定された $cal(sqrtdK-)$ の反復を収束する。従来のSGDと比較して,LIONは損失が小さく,性能も高いことを示す。
論文参考訳（メタデータ） (2024-11-12T11:30:53Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
近年の研究では、再生カーネルヒルベルト空間(RKHS)がニューラルネットワークによる関数のモデル化に適した空間ではないことが示されている。本稿では,有界ノルムを持つオーバーパラメータ化された2層ニューラルネットワークに適した関数空間について検討する。
論文参考訳（メタデータ） (2024-04-29T15:04:07Z)
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation [89.21686761957383]
2層ネットワークにおける第1層パラメータ $boldsymbolW$ の勾配降下ステップについて検討した。我々の結果は、一つのステップでもランダムな特徴に対してかなりの優位性が得られることを示した。
論文参考訳（メタデータ） (2022-05-03T12:09:59Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。