Fugu-MT 論文翻訳(概要): Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer

論文の概要: Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer

arxiv url: http://arxiv.org/abs/2306.17504v1
Date: Fri, 30 Jun 2023 09:33:41 GMT
ステータス: 翻訳完了
システム内更新日: 2023-07-03 12:52:51.805635
Title: Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer
Title（参考訳）: スパースパーター付きシャープネス・アウェア・ミニミゼーション・オプティマイザの系統的検討
Authors: Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Tianshuo Xu, Xiaoshuai Sun, Tongliang Liu, Rongrong Ji, Dacheng Tao
Abstract要約: ディープニューラルネットワークは、複雑で非構造的なロスランドスケープのため、しばしば一般化の貧弱さに悩まされる。 SharpnessAware Minimization (SAM) は、摂動を加える際の景観の変化を最小限に抑えることで損失を平滑化するポピュラーなソリューションである。本稿では,二元マスクによる摂動を効果的かつ効果的に行う訓練手法であるスパースSAMを提案する。
参考スコア（独自算出の注目度）: 158.2634766682187
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks often suffer from poor generalization due to complex and non-convex loss landscapes. Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight. However, indiscriminate perturbation of SAM on all parameters is suboptimal and results in excessive computation, double the overhead of common optimizers like Stochastic Gradient Descent (SGD). In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves sparse perturbation by a binary mask. To obtain the sparse mask, we provide two solutions based on Fisher information and dynamic sparse training, respectively. We investigate the impact of different masks, including unstructured, structured, and $N$:$M$ structured patterns, as well as explicit and implicit forms of implementing sparse perturbation. We theoretically prove that SSAM can converge at the same rate as SAM, i.e., $O(\log T/\sqrt{T})$. Sparse SAM has the potential to accelerate training and smooth the loss landscape effectively. Extensive experimental results on CIFAR and ImageNet-1K confirm that our method is superior to SAM in terms of efficiency, and the performance is preserved or even improved with a perturbation of merely 50\% sparsity. Code is available at https://github.com/Mi-Peng/Systematic-Investigation-of-Sparse-Perturbed-Sharpness-Aware-Minimization -Optimizer.
Abstract（参考訳）: 深層ニューラルネットワークは、複雑で非凸損失の風景のため、しばしば一般化が貧弱である。 SAM(Sharpness-Aware Minimization)は、重量に摂動を加える際のトレーニング損失の最大化を最小化することにより、損失景観を円滑にする一般的なソリューションである。しかし、SAMの全てのパラメータに対する無差別摂動は最適以下であり、過剰な計算の結果、SGD(Stochastic Gradient Descent)のような一般的な最適化器のオーバーヘッドが2倍になる。本稿では,バイナリマスクによるスパース摂動を実現する効率的かつ効果的なトレーニングスキームであるスパースサム(ssam)を提案する。スパースマスクを得るためには,フィッシャー情報と動的スパーストレーニングに基づく2つのソリューションを提供する。非構造化、構造化、および$n$:$m$構造化パターンを含むさまざまなマスクの影響や、スパース摂動を実装する明示的および暗黙的な形式を調査した。 SSAM が SAM と同じ速度で収束できること、すなわち$O(\log T/\sqrt{T})$ を理論的に証明する。スパースSAMは、トレーニングを加速し、損失景観を効果的に滑らかにする可能性がある。 CIFAR と ImageNet-1K の大規模な実験結果から,本手法は SAM よりも効率が良く,50 % の摂動で性能が維持または改善されていることが確認された。コードはhttps://github.com/Mi-Peng/Systematic-Investigation-of-Sparse-Perturbed-Sharpness-Aware-Minimization -Optimizerで公開されている。

論文の概要: Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer

関連論文リスト