S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise
Injection for Reaching Flat Minima
- URL: http://arxiv.org/abs/2009.02479v1
- Date: Sat, 5 Sep 2020 07:02:02 GMT
- Title: S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise
Injection for Reaching Flat Minima
- Authors: Wonyong Sung, Iksoo Choi, Jinhwan Park, Seokhyun Choi, Sungho Shin
- Abstract summary: gradient descent (SGD) method is most widely used for deep neural network (DNN) training.
Weight noise injection has been extensively studied for finding flat minima using the SGD method.
We devise a new weight-noise injection-based SGD method that adds symmetrical noises to the weights.
- Score: 22.46916792590578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The stochastic gradient descent (SGD) method is most widely used for deep
neural network (DNN) training. However, the method does not always converge to
a flat minimum of the loss surface that can demonstrate high generalization
capability. Weight noise injection has been extensively studied for finding
flat minima using the SGD method. We devise a new weight-noise injection-based
SGD method that adds symmetrical noises to the DNN weights. The training with
symmetrical noise evaluates the loss surface at two adjacent points, by which
convergence to sharp minima can be avoided. Fixed-magnitude symmetric noises
are added to minimize training instability. The proposed method is compared
with the conventional SGD method and previous weight-noise injection algorithms
using convolutional neural networks for image classification. Particularly,
performance improvements in large batch training are demonstrated. This method
shows superior performance compared with conventional SGD and weight-noise
injection methods regardless of the batch-size and learning rate scheduling
algorithms.
Related papers
- Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent [21.698853170807684]
Existing adaptive strategies based on gradient noise scale (GNS) offer a principled alternative.<n>We derive gradient noise metrics for signSGD and specSGD that naturally emerge from the geometry of their respective dual norms.<n>Our experiments demonstrate that adaptive batch size strategies using non-Euclidean norms enable us to match the validation loss of constant-batch baselines while reducing training steps by up to 66% for Signum and Muon.
arXiv Detail & Related papers (2026-02-03T02:16:55Z) - How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime? [78.0226274470175]
We investigate whether introducing label noise to the gradient updates can enhance the test performance of neural network (NN)<n>We prove that adding label noise during training suppresses noise memorization, preventing it from dominating the learning process.<n>In contrast, we show that NN trained with standard GD tends to overfit to noise in the same low SNR setting.
arXiv Detail & Related papers (2025-10-20T13:28:13Z) - Adaptive Heavy-Tailed Stochastic Gradient Descent [0.0]
AHTSGD is the first algorithm to adjust the nature of injected noise into a generalization based on the Edge of Stability phenomenon.<n>AHTSGD consistently outperforms SGD and other noise-based methods on benchmarks like MNIST and CIFAR-10, with marked gains on noisy datasets such as SVHN.
arXiv Detail & Related papers (2025-08-29T06:32:26Z) - Understanding the robustness difference between stochastic gradient
descent and adaptive gradient methods [11.895321856533934]
gradient descent (SGD) and adaptive gradient methods have been widely used in training deep neural networks.
We empirically show that while the difference between the standard generalization performance of models trained using these methods is small, those trained using SGD exhibit far greater robustness under input perturbations.
arXiv Detail & Related papers (2023-08-13T07:03:22Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - A Novel Noise Injection-based Training Scheme for Better Model
Robustness [9.749718440407811]
Noise injection-based method has been shown to be able to improve the robustness of artificial neural networks.
In this work, we propose a novel noise injection-based training scheme for better model robustness.
Experiment results show that our proposed method achieves a much better performance on adversarial robustness and slightly better performance on original accuracy.
arXiv Detail & Related papers (2023-02-17T02:50:25Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - The effective noise of Stochastic Gradient Descent [9.645196221785694]
Gradient Descent (SGD) is the workhorse algorithm of deep learning technology.
We characterize the parameters of SGD and a recently-introduced variant, persistent SGD, in a neural network model.
We find that noisier algorithms lead to wider decision boundaries of the corresponding constraint satisfaction problem.
arXiv Detail & Related papers (2021-12-20T20:46:19Z) - Differentially private training of neural networks with Langevin
dynamics forcalibrated predictive uncertainty [58.730520380312676]
We show that differentially private gradient descent (DP-SGD) can yield poorly calibrated, overconfident deep learning models.
This represents a serious issue for safety-critical applications, e.g. in medical diagnosis.
arXiv Detail & Related papers (2021-07-09T08:14:45Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning.
At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Semi-Implicit Back Propagation [1.5533842336139065]
We propose a semi-implicit back propagation method for neural network training.
The difference on the neurons are propagated in a backward fashion and the parameters are updated with proximal mapping.
Experiments on both MNIST and CIFAR-10 demonstrate that the proposed algorithm leads to better performance in terms of both loss decreasing and training/validation accuracy.
arXiv Detail & Related papers (2020-02-10T03:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.