Related papers: Learning Rate Optimization for Deep Neural Networks Using Lipschitz Bandits

Learning Rate Optimization for Deep Neural Networks Using Lipschitz Bandits

URL: http://arxiv.org/abs/2409.09783v1
Date: Sun, 15 Sep 2024 16:21:55 GMT
Title: Learning Rate Optimization for Deep Neural Networks Using Lipschitz Bandits
Authors: Padma Priyanka, Sheetal Kalyani, Avhishek Chatterjee,
Abstract summary: A properly tuned learning rate leads to faster training and higher test accuracy. We propose a Lipschitz bandit-driven approach for tuning the learning rate of neural networks.
Score: 9.361762652324968
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning rate is a crucial parameter in training of neural networks. A properly tuned learning rate leads to faster training and higher test accuracy. In this paper, we propose a Lipschitz bandit-driven approach for tuning the learning rate of neural networks. The proposed approach is compared with the popular HyperOpt technique used extensively for hyperparameter optimization and the recently developed bandit-based algorithm BLiE. The results for multiple neural network architectures indicate that our method finds a better learning rate using a) fewer evaluations and b) lesser number of epochs per evaluation, when compared to both HyperOpt and BLiE. Thus, the proposed approach enables more efficient training of neural networks, leading to lower training time and lesser computational cost.

Related papers

Enhanced quantum state preparation via stochastic prediction of neural network [0.8287206589886881]
In this paper, we explore an intriguing avenue for enhancing algorithm effectiveness through exploiting the knowledge blindness of neural network. Our approach centers around a machine learning algorithm utilized for preparing arbitrary quantum states in a semiconductor double quantum dot system. By leveraging prediction generated by the neural network, we are able to guide the optimization process to escape local optima.
arXiv Detail & Related papers (2023-07-27T09:11:53Z)
The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples. In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z)
Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks. We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z)
Local Critic Training for Model-Parallel Learning of Deep Neural Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z)
Stochastic Markov Gradient Descent and Training Low-Bit Neural Networks [77.34726150561087]
We introduce Gradient Markov Descent (SMGD), a discrete optimization method applicable to training quantized neural networks. We provide theoretical guarantees of algorithm performance as well as encouraging numerical results.
arXiv Detail & Related papers (2020-08-25T15:48:15Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Accelerated learning algorithms of general fuzzy min-max neural network using a novel hyperbox selection rule [9.061408029414455]
The paper proposes a method to accelerate the training process of a general fuzzy min-max neural network. The proposed approach is based on the mathematical formulas to form a branch-and-bound solution. The experimental results indicated the significant decrease in training time of the proposed approach for both online and agglomerative learning algorithms.
arXiv Detail & Related papers (2020-03-25T11:26:18Z)
Tune smarter not harder: A principled approach to tuning learning rates for shallow nets [13.203765985718201]
principled approach to choosing the learning rate is proposed for shallow feedforward neural networks. It is shown through simulations that the proposed search method significantly outperforms the existing tuning methods.
arXiv Detail & Related papers (2020-03-22T09:38:35Z)
Subset Sampling For Progressive Neural Network Learning [106.12874293597754]
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data. We propose to speed up this process by exploiting subsets of training data at each incremental training step. Experimental results in object, scene and face recognition problems demonstrate that the proposed approach speeds up the optimization procedure considerably.
arXiv Detail & Related papers (2020-02-17T18:57:33Z)
Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.