Related papers: Gradient-only line searches to automatically determine learning rates for a variety of stochastic training algorithms

Gradient-only line searches to automatically determine learning rates for a variety of stochastic training algorithms

URL: http://arxiv.org/abs/2007.01054v1
Date: Mon, 29 Jun 2020 08:59:31 GMT
Title: Gradient-only line searches to automatically determine learning rates for a variety of stochastic training algorithms
Authors: Dominic Kafka and Daniel Nicolas Wilke
Abstract summary: We study the application of the Gradient-Only Line Search that is Inexact (GOLS-I) to determine the learning rate schedule for a selection of popular neural network training algorithms. GOLS-I's learning rate schedules are competitive with manually tuned learning rates, over seven optimization algorithms, three types of neural network architecture, 23 datasets and two loss functions.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient-only and probabilistic line searches have recently reintroduced the ability to adaptively determine learning rates in dynamic mini-batch sub-sampled neural network training. However, stochastic line searches are still in their infancy and thus call for an ongoing investigation. We study the application of the Gradient-Only Line Search that is Inexact (GOLS-I) to automatically determine the learning rate schedule for a selection of popular neural network training algorithms, including NAG, Adagrad, Adadelta, Adam and LBFGS, with numerous shallow, deep and convolutional neural network architectures trained on different datasets with various loss functions. We find that GOLS-I's learning rate schedules are competitive with manually tuned learning rates, over seven optimization algorithms, three types of neural network architecture, 23 datasets and two loss functions. We demonstrate that algorithms, which include dominant momentum characteristics, are not well suited to be used with GOLS-I. However, we find GOLS-I to be effective in automatically determining learning rate schedules over 15 orders of magnitude, for most popular neural network training algorithms, effectively removing the need to tune the sensitive hyperparameters of learning rate schedules in neural network training.

Related papers

Learning Rate Optimization for Deep Neural Networks Using Lipschitz Bandits [9.361762652324968]
A properly tuned learning rate leads to faster training and higher test accuracy. We propose a Lipschitz bandit-driven approach for tuning the learning rate of neural networks.
arXiv Detail & Related papers (2024-09-15T16:21:55Z)
Unveiling the Power of Sparse Neural Networks for Feature Selection [60.50319755984697]
Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection. We show that SNNs trained with dynamic sparse training (DST) algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction. Our findings show that feature selection with SNNs trained with DST algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
arXiv Detail & Related papers (2024-08-08T16:48:33Z)
Why Line Search when you can Plane Search? SO-Friendly Neural Networks allow Per-Iteration Optimization of Learning and Momentum Rates for Every Layer [9.849498498869258]
We introduce the class of SO-friendly neural networks, which include several models used in practice. Performing a precise line search to set the step size has the same cost during full-batch training as using a fixed learning. For the same cost a planesearch can be used to set both the learning and momentum rate on each step.
arXiv Detail & Related papers (2024-06-25T22:06:40Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
SA-CNN: Application to text categorization issues using simulated annealing-based convolutional neural network optimization [0.0]
Convolutional neural networks (CNNs) are a representative class of deep learning algorithms. We introduce SA-CNN neural networks for text classification tasks based on Text-CNN neural networks.
arXiv Detail & Related papers (2023-03-13T14:27:34Z)
HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks [51.71682428015139]
We propose HARL, a reinforcement learning-based auto-scheduler for efficient tensor program exploration. HarL improves the tensor operator performance by 22% and the search speed by 4.3x compared to the state-of-the-art auto-scheduler. Inference performance and search speed are also significantly improved on end-to-end neural networks.
arXiv Detail & Related papers (2022-11-21T04:15:27Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Dynamic Analysis of Nonlinear Civil Engineering Structures using Artificial Neural Network with Adaptive Training [2.1202971527014287]
In this study, artificial neural networks are developed with adaptive training algorithms. The networks can successfully predict the time-history response of the shear frame and the rock structure to real ground motion records.
arXiv Detail & Related papers (2021-11-21T21:14:48Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
Stochastic Markov Gradient Descent and Training Low-Bit Neural Networks [77.34726150561087]
We introduce Gradient Markov Descent (SMGD), a discrete optimization method applicable to training quantized neural networks. We provide theoretical guarantees of algorithm performance as well as encouraging numerical results.
arXiv Detail & Related papers (2020-08-25T15:48:15Z)
VINNAS: Variational Inference-based Neural Network Architecture Search [2.685668802278155]
We present a differentiable variational inference-based NAS method for searching sparse convolutional neural networks. Our method finds diverse network cells, while showing state-of-the-art accuracy with up to almost 2 times fewer non-zero parameters.
arXiv Detail & Related papers (2020-07-12T21:47:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.