Related papers: mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

URL: http://arxiv.org/abs/2302.09693v2
Date: Sun, 1 Oct 2023 02:19:50 GMT
Title: mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Authors: Kayhan Behdin, Qingquan Song, Aman Gupta, Sathiya Keerthi, Ayan Acharya, Borja Ocejo, Gregory Dexter, Rajiv Khanna, David Durfee, Rahul Mazumder
Abstract summary: Sharpness-Aware Minimization technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM.
Score: 20.560184120992094
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as micro-batch SAM (mSAM). This variation involves aggregating updates derived from adversarial perturbations across multiple shards (micro-batches) of a mini-batch during training. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM. We provide a thorough empirical evaluation of various image classification and natural language processing tasks to substantiate this theoretical advancement. We also show that contrary to previous work, mSAM can be implemented in a flexible and parallelizable manner without significantly increasing computational costs. Our implementation of mSAM yields superior generalization performance across a wide range of tasks compared to SAM, further supporting our theoretical framework.

Related papers

Monge SAM: Robust Reparameterization-Invariant Sharpness-Aware Minimization Based on Loss Geometry [2.854482269849925]
Sharpness-aware minimization (SAM) efficiently finds flat regions by updating the parameters according to the gradient at an adversarial perturbation. We propose Monge SAM (M-SAM), a reparametrization invariant version of SAM. We demonstrate this behavior both theoretically and empirically on a multi-modal representation alignment task.
arXiv Detail & Related papers (2025-02-12T14:40:19Z)
SAMPa: Sharpness-aware Minimization Parallelized [51.668052890249726]
Sharpness-aware (SAM) has been shown to improve the generalization of neural networks. Each SAM update requires emphsequentially computing two gradients, effectively doubling the per-iteration cost. We propose a simple modification of SAM, termed SAMPa, which allows us to fully parallelize the two gradient computations.
arXiv Detail & Related papers (2024-10-14T16:21:23Z)
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training [47.25594539120258]
We find that Sharpness-Aware Minimization (SAM) efficiently selects flatter minima late in training. Even a few epochs of SAM applied at the end of training yield nearly the same generalization and solution sharpness as full SAM training. We conjecture that the optimization method chosen in the late phase is more crucial in shaping the final solution's properties.
arXiv Detail & Related papers (2024-10-14T10:56:42Z)
Bilateral Sharpness-Aware Minimization for Flatter Minima [61.17349662062522]
Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS) In this paper, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote as Min-Sharpness (MinS) By merging MaxS and MinS, we created a better FI that indicates a flatter direction during the optimization. Specially, we combine this FI with SAM into the proposed Bilateral SAM (BSAM) which finds a more flatter minimum than that of SAM.
arXiv Detail & Related papers (2024-09-20T03:01:13Z)
Friendly Sharpness-Aware Minimization [62.57515991835801]
Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness. We investigate the key role of batch-specific gradient noise within the adversarial perturbation, i.e., the current minibatch gradient. By decomposing the adversarial gradient noise components, we discover that relying solely on the full gradient degrades generalization while excluding it leads to improved performance.
arXiv Detail & Related papers (2024-03-19T01:39:33Z)
Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy [12.050160495730381]
sharpness-aware generalization (SAM) has attracted much attention because of its surprising effectiveness in improving performance. We propose a simple renormalization strategy, dubbed Stable SAM (SSAM), so that the gradient norm of the descent step maintains the same as that of the ascent step. Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost.
arXiv Detail & Related papers (2024-01-14T10:53:36Z)
TinySAM: Pushing the Envelope for Efficient Segment Anything Model [76.21007576954035]
We propose a framework to obtain a tiny segment anything model (TinySAM) while maintaining the strong zero-shot performance. We first propose a full-stage knowledge distillation method with hard prompt sampling and hard mask weighting strategy to distill a lightweight student model. We also adapt the post-training quantization to the promptable segmentation task and further reduce the computational cost.
arXiv Detail & Related papers (2023-12-21T12:26:11Z)
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes. SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation. In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z)
Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization [14.40189851070842]
Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima. Recent work suggests that mSAM can outperform SAM in terms of test accuracy. This paper presents a comprehensive empirical evaluation of mSAM on various tasks and datasets.
arXiv Detail & Related papers (2022-12-07T00:37:55Z)
Towards Efficient and Scalable Sharpness-Aware Minimization [81.22779501753695]
We propose a novel algorithm LookSAM that only periodically calculates the inner gradient ascent. LookSAM achieves similar accuracy gains to SAM while being tremendously faster. We are the first to successfully scale up the batch size when training Vision Transformers (ViTs)
arXiv Detail & Related papers (2022-03-05T11:53:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.