Tune smarter not harder: A principled approach to tuning learning rates
for shallow nets
- URL: http://arxiv.org/abs/2003.09844v3
- Date: Wed, 30 Sep 2020 02:18:30 GMT
- Title: Tune smarter not harder: A principled approach to tuning learning rates
for shallow nets
- Authors: Thulasi Tholeti, Sheetal Kalyani
- Abstract summary: principled approach to choosing the learning rate is proposed for shallow feedforward neural networks.
It is shown through simulations that the proposed search method significantly outperforms the existing tuning methods.
- Score: 13.203765985718201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective hyper-parameter tuning is essential to guarantee the performance
that neural networks have come to be known for. In this work, a principled
approach to choosing the learning rate is proposed for shallow feedforward
neural networks. We associate the learning rate with the gradient Lipschitz
constant of the objective to be minimized while training. An upper bound on the
mentioned constant is derived and a search algorithm, which always results in
non-divergent traces, is proposed to exploit the derived bound. It is shown
through simulations that the proposed search method significantly outperforms
the existing tuning methods such as Tree Parzen Estimators (TPE). The proposed
method is applied to three different existing applications: a) channel
estimation in OFDM systems, b) prediction of the exchange currency rates and c)
offset estimation in OFDM receivers, and it is shown to pick better learning
rates than the existing methods using the same or lesser compute power.
Related papers
- Gradient-Free Training of Recurrent Neural Networks using Random Perturbations [1.1742364055094265]
Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities.
Backpropagation through time (BPTT), the prevailing method, extends the backpropagation algorithm by unrolling the RNN over time.
BPTT suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information.
We present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT.
arXiv Detail & Related papers (2024-05-14T21:15:29Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Learning to Perform Downlink Channel Estimation in Massive MIMO Systems [72.76968022465469]
We study downlink (DL) channel estimation in a Massive multiple-input multiple-output (MIMO) system.
A common approach is to use the mean value as the estimate, motivated by channel hardening.
We propose two novel estimation methods.
arXiv Detail & Related papers (2021-09-06T13:42:32Z) - Inertial Proximal Deep Learning Alternating Minimization for Efficient
Neutral Network Training [16.165369437324266]
This work develops an improved DLAM by the well-known inertial technique, namely iPDLAM, which predicts a point by linearization of current and last iterates.
Numerical results on real-world datasets are reported to demonstrate the efficiency of our proposed algorithm.
arXiv Detail & Related papers (2021-01-30T16:40:08Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - Stochastic gradient descent with random learning rate [0.0]
We propose to optimize neural networks with a uniformly-distributed random learning rate.
By comparing the random learning rate protocol with cyclic and constant protocols, we suggest that the random choice is generically the best strategy in the small learning rate regime.
We provide supporting evidence through experiments on both shallow, fully-connected and deep, convolutional neural networks for image classification on the MNIST and CIFAR10 datasets.
arXiv Detail & Related papers (2020-03-15T21:36:46Z) - An improved online learning algorithm for general fuzzy min-max neural
network [11.631815277762257]
This paper proposes an improved version of the current online learning algorithm for a general fuzzy min-max neural network (GFMM)
The proposed approach does not use the contraction process for overlapping hyperboxes, which is more likely to increase the error rate.
In order to reduce the sensitivity to the training samples presentation order of this new on-line learning algorithm, a simple ensemble method is also proposed.
arXiv Detail & Related papers (2020-01-08T06:24:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.