Tune smarter not harder: A principled approach to tuning learning rates
for shallow nets
- URL: http://arxiv.org/abs/2003.09844v3
- Date: Wed, 30 Sep 2020 02:18:30 GMT
- Title: Tune smarter not harder: A principled approach to tuning learning rates
for shallow nets
- Authors: Thulasi Tholeti, Sheetal Kalyani
- Abstract summary: principled approach to choosing the learning rate is proposed for shallow feedforward neural networks.
It is shown through simulations that the proposed search method significantly outperforms the existing tuning methods.
- Score: 13.203765985718201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective hyper-parameter tuning is essential to guarantee the performance
that neural networks have come to be known for. In this work, a principled
approach to choosing the learning rate is proposed for shallow feedforward
neural networks. We associate the learning rate with the gradient Lipschitz
constant of the objective to be minimized while training. An upper bound on the
mentioned constant is derived and a search algorithm, which always results in
non-divergent traces, is proposed to exploit the derived bound. It is shown
through simulations that the proposed search method significantly outperforms
the existing tuning methods such as Tree Parzen Estimators (TPE). The proposed
method is applied to three different existing applications: a) channel
estimation in OFDM systems, b) prediction of the exchange currency rates and c)
offset estimation in OFDM receivers, and it is shown to pick better learning
rates than the existing methods using the same or lesser compute power.
Related papers
- Advancing Training Efficiency of Deep Spiking Neural Networks through Rate-based Backpropagation [8.683798989767771]
Recent insights have revealed that rate-coding is a primary form of information representation captured by surrogate-gradient-based Backpropagation Through Time (BPTT) in training deep Spiking Neural Networks (SNNs)
We propose rate-based backpropagation, a training strategy specifically designed to exploit rate-based representations to reduce the complexity of BPTT.
Our method minimizes reliance on detailed temporal derivatives by focusing on averaged dynamics, streamlining the computational graph to reduce memory and computational demands of SNNs training.
arXiv Detail & Related papers (2024-10-15T10:46:03Z) - Learning Rate Optimization for Deep Neural Networks Using Lipschitz Bandits [9.361762652324968]
A properly tuned learning rate leads to faster training and higher test accuracy.
We propose a Lipschitz bandit-driven approach for tuning the learning rate of neural networks.
arXiv Detail & Related papers (2024-09-15T16:21:55Z) - Optimization of Iterative Blind Detection based on Expectation Maximization and Belief Propagation [29.114100423416204]
We propose a blind symbol detection for block-fading linear inter-symbol channels.
We design a joint channel estimation and detection scheme that combines the study expectation algorithm and the ubiquitous belief propagation algorithm.
We show that the proposed method can learn efficient schedules that generalize well and even outperform coherent BP detection in high signal-to-noise scenarios.
arXiv Detail & Related papers (2024-08-05T08:45:50Z) - Gradient-Free Training of Recurrent Neural Networks using Random Perturbations [1.1742364055094265]
Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities.
Backpropagation through time (BPTT), the prevailing method, extends the backpropagation algorithm by unrolling the RNN over time.
BPTT suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information.
We present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT.
arXiv Detail & Related papers (2024-05-14T21:15:29Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Learning to Perform Downlink Channel Estimation in Massive MIMO Systems [72.76968022465469]
We study downlink (DL) channel estimation in a Massive multiple-input multiple-output (MIMO) system.
A common approach is to use the mean value as the estimate, motivated by channel hardening.
We propose two novel estimation methods.
arXiv Detail & Related papers (2021-09-06T13:42:32Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - An improved online learning algorithm for general fuzzy min-max neural
network [11.631815277762257]
This paper proposes an improved version of the current online learning algorithm for a general fuzzy min-max neural network (GFMM)
The proposed approach does not use the contraction process for overlapping hyperboxes, which is more likely to increase the error rate.
In order to reduce the sensitivity to the training samples presentation order of this new on-line learning algorithm, a simple ensemble method is also proposed.
arXiv Detail & Related papers (2020-01-08T06:24:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.