Fugu-MT 論文翻訳(概要): Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

論文の概要: Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

arxiv url: http://arxiv.org/abs/2210.05177v1
Date: Tue, 11 Oct 2022 06:30:10 GMT
ステータス: 翻訳完了
システム内更新日: 2022-10-12 13:43:21.662123
Title: Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
Title（参考訳）: 鋭さを意識した最小化をより強くする:スパース化摂動アプローチ
Authors: Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji, Dacheng Tao
Abstract要約: 人気のソリューションの1つがSAM(Sharpness-Aware Minimization)であり、摂動を加える際の体重減少の変化を最小限に抑える。本稿では,Sparse SAM (SSAM) とよばれる効率的な学習手法を提案する。さらに、S が同じSAM、すなわち $O(log T/sqrtTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT で収束できることを理論的に証明する。
参考スコア（独自算出の注目度）: 132.37966970098645
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks often suffer from poor generalization caused by complex and non-convex loss landscapes. One of the popular solutions is Sharpness-Aware Minimization (SAM), which smooths the loss landscape via minimizing the maximized change of training loss when adding a perturbation to the weight. However, we find the indiscriminate perturbation of SAM on all parameters is suboptimal, which also results in excessive computation, i.e., double the overhead of common optimizers like Stochastic Gradient Descent (SGD). In this paper, we propose an efficient and effective training scheme coined as Sparse SAM (SSAM), which achieves sparse perturbation by a binary mask. To obtain the sparse mask, we provide two solutions which are based onFisher information and dynamic sparse training, respectively. In addition, we theoretically prove that SSAM can converge at the same rate as SAM, i.e., $O(\log T/\sqrt{T})$. Sparse SAM not only has the potential for training acceleration but also smooths the loss landscape effectively. Extensive experimental results on CIFAR10, CIFAR100, and ImageNet-1K confirm the superior efficiency of our method to SAM, and the performance is preserved or even better with a perturbation of merely 50% sparsity. Code is availiable at \url{https://github.com/Mi-Peng/Sparse-Sharpness-Aware-Minimization}.
Abstract（参考訳）: ディープニューラルネットワークは、複雑で非凸なロスランドスケープによって引き起こされる一般化に苦しむことが多い。人気のソリューションのひとつにSAM(Sharpness-Aware Minimization)がある。これは、重量に摂動を加える際のトレーニング損失の最大化を最小化することによって、損失景観を円滑にする。しかし、SAMの全てのパラメータに対する非差別的な摂動は、過度な計算、すなわちStochastic Gradient Descent (SGD)のような一般的なオプティマイザのオーバーヘッドを2倍にする。本稿では,二元マスクによるスパース摂動を実現するための,スパースSAM(SSAM)と呼ばれる効率的かつ効果的なトレーニング手法を提案する。スパースマスクを得るには、それぞれfisher informationとdynamic sparse trainingに基づく2つのソリューションを提供する。さらに、理論上は SSAM が SAM と同じ速度で収束できること、すなわち$O(\log T/\sqrt{T})$ を証明している。スパースSAMはトレーニングアクセラレーションの可能性を秘めているだけでなく、ロスランドスケープを効果的に滑らかにする。 CIFAR10, CIFAR100, ImageNet-1Kの広範囲な実験結果から, SAM法よりも優れた効率性が確認された。コードは \url{https://github.com/Mi-Peng/Sparse-Sharpness-Aware-Minimization} で利用可能である。

論文の概要: Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

関連論文リスト