SING: A Plug-and-Play DNN Learning Technique
- URL: http://arxiv.org/abs/2305.15997v1
- Date: Thu, 25 May 2023 12:39:45 GMT
- Title: SING: A Plug-and-Play DNN Learning Technique
- Authors: Adrien Courtois, Damien Scieur, Jean-Michel Morel, Pablo Arias, Thomas
Eboli
- Abstract summary: We propose SING (StabIlized and Normalized Gradient), a plug-and-play technique that improves the stability and robustness of Adam(W)
SING is straightforward to implement and has minimal computational overhead, requiring only a layer-wise standardization of gradients fed to Adam(W)
- Score: 25.563053353709627
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose SING (StabIlized and Normalized Gradient), a plug-and-play
technique that improves the stability and generalization of the Adam(W)
optimizer. SING is straightforward to implement and has minimal computational
overhead, requiring only a layer-wise standardization of the gradients fed to
Adam(W) without introducing additional hyper-parameters. We support the
effectiveness and practicality of the proposed approach by showing improved
results on a wide range of architectures, problems (such as image
classification, depth estimation, and natural language processing), and in
combination with other optimizers. We provide a theoretical analysis of the
convergence of the method, and we show that by virtue of the standardization,
SING can escape local minima narrower than a threshold that is inversely
proportional to the network's depth.
Related papers
- CaAdam: Improving Adam optimizer using connection aware methods [0.0]
We introduce a new method inspired by Adam that enhances convergence speed and achieves better loss function minima.
Traditional proxies, including Adam, apply uniform or globally adjusted learning rates across neural networks without considering their architectural specifics.
Our algorithm, CaAdam, explores this overlooked area by introducing connection-aware optimization through carefully designed of architectural information.
arXiv Detail & Related papers (2024-10-31T17:59:46Z) - Component-based Sketching for Deep ReLU Nets [55.404661149594375]
We develop a sketching scheme based on deep net components for various tasks.
We transform deep net training into a linear empirical risk minimization problem.
We show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions.
arXiv Detail & Related papers (2024-09-21T15:30:43Z) - Adaptive Anomaly Detection in Network Flows with Low-Rank Tensor Decompositions and Deep Unrolling [9.20186865054847]
Anomaly detection (AD) is increasingly recognized as a key component for ensuring the resilience of future communication systems.
This work considers AD in network flows using incomplete measurements.
We propose a novel block-successive convex approximation algorithm based on a regularized model-fitting objective.
Inspired by Bayesian approaches, we extend the model architecture to perform online adaptation to per-flow and per-time-step statistics.
arXiv Detail & Related papers (2024-09-17T19:59:57Z) - Edge-Efficient Deep Learning Models for Automatic Modulation Classification: A Performance Analysis [0.7428236410246183]
We investigate optimized convolutional neural networks (CNNs) developed for automatic modulation classification (AMC) of wireless signals.
We propose optimized models with the combinations of these techniques to fuse the complementary optimization benefits.
The experimental results show that the proposed individual and combined optimization techniques are highly effective for developing models with significantly less complexity.
arXiv Detail & Related papers (2024-04-11T06:08:23Z) - G-TRACER: Expected Sharpness Optimization [1.2183405753834562]
G-TRACER promotes generalization by seeking flat minima, and has a sound theoretical basis as an approximation to a natural-gradient descent based optimization of a generalized Bayes objective.
We show that the method converges to a neighborhood of a local minimum of the unregularized objective, and demonstrate competitive performance on a number of benchmark computer vision and NLP datasets.
arXiv Detail & Related papers (2023-06-24T09:28:49Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning
Optimization Landscape [15.362190838843915]
We show that LPF-SGD converges to a better optimal point with smaller generalization error than SGD.
We show that our algorithm achieves superior generalization performance compared to the common DL training strategies.
arXiv Detail & Related papers (2022-01-20T07:13:04Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.