Related papers: SING: A Plug-and-Play DNN Learning Technique

SING: A Plug-and-Play DNN Learning Technique

URL: http://arxiv.org/abs/2305.15997v1
Date: Thu, 25 May 2023 12:39:45 GMT
Title: SING: A Plug-and-Play DNN Learning Technique
Authors: Adrien Courtois, Damien Scieur, Jean-Michel Morel, Pablo Arias, Thomas Eboli
Abstract summary: We propose SING (StabIlized and Normalized Gradient), a plug-and-play technique that improves the stability and robustness of Adam(W) SING is straightforward to implement and has minimal computational overhead, requiring only a layer-wise standardization of gradients fed to Adam(W)
Score: 25.563053353709627
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We propose SING (StabIlized and Normalized Gradient), a plug-and-play technique that improves the stability and generalization of the Adam(W) optimizer. SING is straightforward to implement and has minimal computational overhead, requiring only a layer-wise standardization of the gradients fed to Adam(W) without introducing additional hyper-parameters. We support the effectiveness and practicality of the proposed approach by showing improved results on a wide range of architectures, problems (such as image classification, depth estimation, and natural language processing), and in combination with other optimizers. We provide a theoretical analysis of the convergence of the method, and we show that by virtue of the standardization, SING can escape local minima narrower than a threshold that is inversely proportional to the network's depth.

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Tuning-Free Structured Sparse PCA via Deep Unfolding Networks [5.931547772157972]
We propose a new type of sparse principal component analysis (PCA) for unsupervised feature selection (UFS) We use an interpretable deep unfolding network that translates iterative optimization steps into trainable neural architectures. This innovation enables automatic learning of the regularization parameters, effectively bypassing the empirical tuning requirements of conventional methods.
arXiv Detail & Related papers (2025-02-28T08:32:51Z)
Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z)
CaAdam: Improving Adam optimizer using connection aware methods [0.0]
We introduce a new method inspired by Adam that enhances convergence speed and achieves better loss function minima. Traditional proxies, including Adam, apply uniform or globally adjusted learning rates across neural networks without considering their architectural specifics. Our algorithm, CaAdam, explores this overlooked area by introducing connection-aware optimization through carefully designed of architectural information.
arXiv Detail & Related papers (2024-10-31T17:59:46Z)
Component-based Sketching for Deep ReLU Nets [55.404661149594375]
We develop a sketching scheme based on deep net components for various tasks. We transform deep net training into a linear empirical risk minimization problem. We show that the proposed component-based sketching provides almost optimal rates in approximating saturated functions.
arXiv Detail & Related papers (2024-09-21T15:30:43Z)
Adaptive Anomaly Detection in Network Flows with Low-Rank Tensor Decompositions and Deep Unrolling [9.20186865054847]
Anomaly detection (AD) is increasingly recognized as a key component for ensuring the resilience of future communication systems. This work considers AD in network flows using incomplete measurements. We propose a novel block-successive convex approximation algorithm based on a regularized model-fitting objective. Inspired by Bayesian approaches, we extend the model architecture to perform online adaptation to per-flow and per-time-step statistics.
arXiv Detail & Related papers (2024-09-17T19:59:57Z)
Edge-Efficient Deep Learning Models for Automatic Modulation Classification: A Performance Analysis [0.7428236410246183]
We investigate optimized convolutional neural networks (CNNs) developed for automatic modulation classification (AMC) of wireless signals. We propose optimized models with the combinations of these techniques to fuse the complementary optimization benefits. The experimental results show that the proposed individual and combined optimization techniques are highly effective for developing models with significantly less complexity.
arXiv Detail & Related papers (2024-04-11T06:08:23Z)
G-TRACER: Expected Sharpness Optimization [1.2183405753834562]
G-TRACER promotes generalization by seeking flat minima, and has a sound theoretical basis as an approximation to a natural-gradient descent based optimization of a generalized Bayes objective. We show that the method converges to a neighborhood of a local minimum of the unregularized objective, and demonstrate competitive performance on a number of benchmark computer vision and NLP datasets.
arXiv Detail & Related papers (2023-06-24T09:28:49Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network. We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint. Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z)
Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks. We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights. Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z)
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape [15.362190838843915]
We show that LPF-SGD converges to a better optimal point with smaller generalization error than SGD. We show that our algorithm achieves superior generalization performance compared to the common DL training strategies.
arXiv Detail & Related papers (2022-01-20T07:13:04Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.