Related papers: Unintended Effects on Adaptive Learning Rate for Training Neural Network with Output Scale Change

Unintended Effects on Adaptive Learning Rate for Training Neural Network with Output Scale Change

URL: http://arxiv.org/abs/2103.03466v1
Date: Fri, 5 Mar 2021 04:19:52 GMT
Title: Unintended Effects on Adaptive Learning Rate for Training Neural Network with Output Scale Change
Authors: Ryuichi Kanoh, Mahito Sugiyama
Abstract summary: We show that the combination of such a scaling factor and an adaptive learning rate strongly affects the training behavior of the neural network. Specifically, for some scaling settings, the effect of the adaptive learning rate disappears or is strongly influenced by the scaling factor. We present a modification of an optimization algorithm and demonstrate remarkable differences between adaptive learning rate optimization and simple gradient descent.
Score: 8.020742121274417
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A multiplicative constant scaling factor is often applied to the model output to adjust the dynamics of neural network parameters. This has been used as one of the key interventions in an empirical study of lazy and active behavior. However, we show that the combination of such scaling and a commonly used adaptive learning rate optimizer strongly affects the training behavior of the neural network. This is problematic as it can cause \emph{unintended behavior} of neural networks, resulting in the misinterpretation of experimental results. Specifically, for some scaling settings, the effect of the adaptive learning rate disappears or is strongly influenced by the scaling factor. To avoid the unintended effect, we present a modification of an optimization algorithm and demonstrate remarkable differences between adaptive learning rate optimization and simple gradient descent, especially with a small ($<1.0$) scaling factor.

Related papers

Adaptive multiple optimal learning factors for neural network training [0.0]
The proposed Adaptive Multiple Optimal Learning Factors (AMOLF) algorithm dynamically adjusts the number of learning factors based on the error change per multiply. The thesis also introduces techniques for grouping weights based on the curvature of the objective function and for compressing large Hessian matrices.
arXiv Detail & Related papers (2024-06-04T21:18:24Z)
Task adaption by biologically inspired stochastic comodulation [8.59194778459436]
We show that fine-tuning convolutional networks by gain modulation improves on deterministic gain modulation. Our results suggest that comodulation representations can enhance learning efficiency and performance in multi-task learning.
arXiv Detail & Related papers (2023-11-25T15:21:03Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
RankNEAT: Outperforming Stochastic Gradient Search in Preference Learning Tasks [2.570570340104555]
gradient descent (SGD) is a premium optimization method for training neural networks. We introduce the RankNEAT algorithm which learns to rank through neuroevolution of augmenting topologies. Results suggest that RankNEAT is a viable and highly efficient evolutionary alternative to preference learning.
arXiv Detail & Related papers (2022-04-14T12:01:00Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem) AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient. Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions. influence estimates are fairly accurate for shallow networks. Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z)
Advantages of biologically-inspired adaptive neural activation in RNNs during learning [10.357949759642816]
We introduce a novel parametric family of nonlinear activation functions inspired by input-frequency response curves of biological neurons. We find that activation adaptation provides distinct task-specific solutions and in some cases, improves both learning speed and performance.
arXiv Detail & Related papers (2020-06-22T13:49:52Z)
The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)
The Break-Even Point on Optimization Trajectories of Deep Neural Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory. We show that using a large learning rate in the initial phase of training reduces the variance of the gradient. We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.