Multiplicative update rules for accelerating deep learning training and
increasing robustness
- URL: http://arxiv.org/abs/2307.07189v1
- Date: Fri, 14 Jul 2023 06:44:43 GMT
- Title: Multiplicative update rules for accelerating deep learning training and
increasing robustness
- Authors: Manos Kirtas, Nikolaos Passalis, Anastasios Tefas
- Abstract summary: We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules.
We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
- Score: 69.90473612073767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Even nowadays, where Deep Learning (DL) has achieved state-of-the-art
performance in a wide range of research domains, accelerating training and
building robust DL models remains a challenging task. To this end, generations
of researchers have pursued to develop robust methods for training DL
architectures that can be less sensitive to weight distributions, model
architectures and loss landscapes. However, such methods are limited to
adaptive learning rate optimizers, initialization schemes, and clipping
gradients without investigating the fundamental rule of parameters update.
Although multiplicative updates have contributed significantly to the early
development of machine learning and hold strong theoretical claims, to best of
our knowledge, this is the first work that investigate them in context of DL
training acceleration and robustness. In this work, we propose an optimization
framework that fits to a wide range of optimization algorithms and enables one
to apply alternative update rules. To this end, we propose a novel
multiplicative update rule and we extend their capabilities by combining it
with a traditional additive update term, under a novel hybrid update method. We
claim that the proposed framework accelerates training, while leading to more
robust models in contrast to traditionally used additive update rule and we
experimentally demonstrate their effectiveness in a wide range of task and
optimization methods. Such tasks ranging from convex and non-convex
optimization to difficult image classification benchmarks applying a wide range
of traditionally used optimization methods and Deep Neural Network (DNN)
architectures.
Related papers
- Narrowing the Focus: Learned Optimizers for Pretrained Models [24.685918556547055]
We propose a novel technique that learns a layer-specific linear combination of update directions provided by a set of base work tasks.
When evaluated on an image, this specialized significantly outperforms both traditional off-the-shelf methods such as Adam, as well existing general learneds.
arXiv Detail & Related papers (2024-08-17T23:55:19Z) - Advancing Neural Network Performance through Emergence-Promoting Initialization Scheme [0.0]
We introduce a novel yet straightforward neural network initialization scheme.
Inspired by the concept of emergence and leveraging the emergence measures proposed by Li (2023), our method adjusts layer-wise weight scaling factors to achieve higher emergence values.
We demonstrate substantial improvements in both model accuracy and training speed, with and without batch normalization.
arXiv Detail & Related papers (2024-07-26T18:56:47Z) - Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning [55.5715496559514]
LoRA Slow Cascade Learning (LoRASC) is an innovative technique designed to enhance LoRA's expressiveness and generalization capabilities.
Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model's ability to capture complex patterns.
arXiv Detail & Related papers (2024-07-01T17:28:59Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z) - Top-KAST: Top-K Always Sparse Training [50.05611544535801]
We propose Top-KAST, a method that preserves constant sparsity throughout training.
We show that it performs comparably to or better than previous works when training models on the established ImageNet benchmark.
In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling.
arXiv Detail & Related papers (2021-06-07T11:13:05Z) - Neural Network Training Techniques Regularize Optimization Trajectory:
An Empirical Study [17.9739959287894]
Modern deep neural network (DNN) trainings utilize various training techniques, e.g., nonlinear activation functions, batch normalization, skip-connections, etc.
We show that successful DNNs consistently obey a certain regularity principle that regularizes the model update direction to be aligned with the trajectory direction.
Empirically, we find that DNN trainings that apply the training techniques achieve a fast convergence and obey the regularity principle with a large regularization parameter, implying that the model updates are well aligned with the trajectory.
arXiv Detail & Related papers (2020-11-13T00:26:43Z) - A Differential Game Theoretic Neural Optimizer for Training Residual
Networks [29.82841891919951]
We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers.
The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
arXiv Detail & Related papers (2020-07-17T10:19:17Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Improved Adversarial Training via Learned Optimizer [101.38877975769198]
We propose a framework to improve the robustness of adversarial training models.
By co-training's parameters model's weights, the proposed framework consistently improves robustness and steps adaptively for update directions.
arXiv Detail & Related papers (2020-04-25T20:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.