SAMSON: Sharpness-Aware Minimization Scaled by Outlier Normalization for
Improving DNN Generalization and Robustness
- URL: http://arxiv.org/abs/2211.11561v2
- Date: Tue, 21 Mar 2023 12:34:27 GMT
- Title: SAMSON: Sharpness-Aware Minimization Scaled by Outlier Normalization for
Improving DNN Generalization and Robustness
- Authors: Gon\c{c}alo Mordido, S\'ebastien Henwood, Sarath Chandar, Fran\c{c}ois
Leduc-Primeau
- Abstract summary: Energy-efficient deep neural network (DNN) accelerators are prone to non-idealities that degrade performance at inference time.
Existing methods typically add perturbations to the DNN weights during training to simulate inference on noisy hardware.
We show that applying sharpness-aware training, by optimizing for both the loss value and loss sharpness, significantly improves robustness to noisy hardware at inference time.
- Score: 11.249410336982258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Energy-efficient deep neural network (DNN) accelerators are prone to
non-idealities that degrade DNN performance at inference time. To mitigate such
degradation, existing methods typically add perturbations to the DNN weights
during training to simulate inference on noisy hardware. However, this often
requires knowledge about the target hardware and leads to a trade-off between
DNN performance and robustness, decreasing the former to increase the latter.
In this work, we show that applying sharpness-aware training, by optimizing for
both the loss value and loss sharpness, significantly improves robustness to
noisy hardware at inference time without relying on any assumptions about the
target hardware. In particular, we propose a new adaptive sharpness-aware
method that conditions the worst-case perturbation of a given weight not only
on its magnitude but also on the range of the weight distribution. This is
achieved by performing sharpness-aware minimization scaled by outlier
minimization (SAMSON). Our approach outperforms existing sharpness-aware
training methods both in terms of model generalization performance in noiseless
regimes and robustness in noisy settings, as measured on several architectures
and datasets.
Related papers
- Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification [53.727688136434345]
Graph Neural Networks (GNNs) have shown superior performance in node classification.
We present Fast Graph Sharpness-Aware Minimization (FGSAM) that integrates the rapid training of Multi-Layer Perceptrons with the superior performance of GNNs.
Our proposed algorithm outperforms the standard SAM with lower computational costs in FSNC tasks.
arXiv Detail & Related papers (2024-10-22T09:33:29Z) - Compute-in-Memory based Neural Network Accelerators for Safety-Critical
Systems: Worst-Case Scenarios and Protections [8.813981342105151]
We study the problem of pinpointing the worst-case performance of CiM accelerators affected by device variations.
We propose a novel worst-case-aware training technique named A-TRICE that efficiently combines adversarial training and noise-injection training.
Our experimental results demonstrate that A-TRICE improves the worst-case accuracy under device variations by up to 33%.
arXiv Detail & Related papers (2023-12-11T05:56:00Z) - Achieving Constraints in Neural Networks: A Stochastic Augmented
Lagrangian Approach [49.1574468325115]
Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting.
We propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem.
We employ the Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism.
arXiv Detail & Related papers (2023-10-25T13:55:35Z) - Examining the Role and Limits of Batchnorm Optimization to Mitigate
Diverse Hardware-noise in In-memory Computing [3.9488615467284225]
In-Memory Computing (IMC) platforms such as analog crossbars are gaining focus as they facilitate the acceleration of low-precision Deep Neural Networks (DNNs) with high area- & compute-efficiencies.
The intrinsic non-idealities in crossbars, which are often non-deterministic and non-linear, degrade the performance of the deployed DNNs.
This work aims to examine the distortions caused by these non-idealities on the dot-product operations in analog crossbars.
arXiv Detail & Related papers (2023-05-28T19:07:25Z) - Dynamics-Aware Loss for Learning with Label Noise [73.75129479936302]
Label noise poses a serious threat to deep neural networks (DNNs)
We propose a dynamics-aware loss (DAL) to solve this problem.
Both the detailed theoretical analyses and extensive experimental results demonstrate the superiority of our method.
arXiv Detail & Related papers (2023-03-21T03:05:21Z) - Sharpness-Aware Training for Free [163.1248341911413]
SharpnessAware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error.
Sharpness-Aware Training Free (SAF) mitigates the sharp landscape at almost zero computational cost over the base.
SAF ensures the convergence to a flat minimum with improved capabilities.
arXiv Detail & Related papers (2022-05-27T16:32:43Z) - {\delta}-SAM: Sharpness-Aware Minimization with Dynamic Reweighting [17.50856935207308]
Adversarial training has shown effectiveness in improving generalization by regularizing the change of loss on top of adversarially chosen perturbations.
The recently proposed sharpness-aware minimization (SAM) algorithm adopts adversarial weight perturbation, encouraging the model to converging to a flat minima.
We propose that dynamically reweighted perturbation within each batch, where unguarded instances are up-weighted, can serve as a better approximation to per-instance perturbation.
arXiv Detail & Related papers (2021-12-16T10:36:35Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z) - Sharpness-Aware Minimization for Efficiently Improving Generalization [36.87818971067698]
We introduce a novel, effective procedure for simultaneously minimizing loss value and loss sharpness.
Sharpness-Aware Minimization (SAM) seeks parameters that lie in neighborhoods having uniformly low loss.
We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets.
arXiv Detail & Related papers (2020-10-03T19:02:10Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.