ExpTest: Automating Learning Rate Searching and Tuning with Insights from Linearized Neural Networks
- URL: http://arxiv.org/abs/2411.16975v1
- Date: Mon, 25 Nov 2024 22:58:22 GMT
- Title: ExpTest: Automating Learning Rate Searching and Tuning with Insights from Linearized Neural Networks
- Authors: Zan Chaudhry, Naoko Mizuno,
- Abstract summary: We present ExpTest, a sophisticated method for initial learning rate searching and subsequent learning rate tuning.
We mathematically justify ExpTest and provide empirical support.
- Score: 0.0
- License:
- Abstract: Hyperparameter tuning remains a significant challenge for the training of deep neural networks (DNNs), requiring manual and/or time-intensive grid searches, increasing resource costs and presenting a barrier to the democratization of machine learning. The global initial learning rate for DNN training is particularly important. Several techniques have been proposed for automated learning rate tuning during training; however, they still require manual searching for the global initial learning rate. Though methods exist that do not require this initial selection, they suffer from poor performance. Here, we present ExpTest, a sophisticated method for initial learning rate searching and subsequent learning rate tuning for the training of DNNs. ExpTest draws on insights from linearized neural networks and the form of the loss curve, which we treat as a real-time signal upon which we perform hypothesis testing. We mathematically justify ExpTest and provide empirical support. ExpTest requires minimal overhead, is robust to hyperparameter choice, and achieves state-of-the-art performance on a variety of tasks and architectures, without initial learning rate selection or learning rate scheduling.
Related papers
- Are Sparse Neural Networks Better Hard Sample Learners? [24.2141078613549]
Hard samples play a crucial role in the optimal performance of deep neural networks.
Most SNNs trained on challenging samples can often match or surpass dense models in accuracy at certain sparsity levels.
arXiv Detail & Related papers (2024-09-13T21:12:18Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone.
To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data.
Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions.
We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Uncertainty Quantification and Resource-Demanding Computer Vision
Applications of Deep Learning [5.130440339897478]
Bringing deep neural networks (DNNs) into safety critical applications requires a thorough treatment of the model's uncertainties.
In this article, we survey methods that we developed to teach DNNs to be uncertain when they encounter new object classes.
We also present training methods to learn from only a few labels with help of uncertainty quantification.
arXiv Detail & Related papers (2022-05-30T08:31:03Z) - How does unlabeled data improve generalization in self-training? A
one-hidden-layer theoretical analysis [93.37576644429578]
This work establishes the first theoretical analysis for the known iterative self-training paradigm.
We prove the benefits of unlabeled data in both training convergence and generalization ability.
Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
arXiv Detail & Related papers (2022-01-21T02:16:52Z) - Rethinking Nearest Neighbors for Visual Classification [56.00783095670361]
k-NN is a lazy learning method that aggregates the distance between the test image and top-k neighbors in a training set.
We adopt k-NN with pre-trained visual representations produced by either supervised or self-supervised methods in two steps.
Via extensive experiments on a wide range of classification tasks, our study reveals the generality and flexibility of k-NN integration.
arXiv Detail & Related papers (2021-12-15T20:15:01Z) - Superiorities of Deep Extreme Learning Machines against Convolutional
Neural Networks [3.04585143845864]
Deep Learning (DL) is a machine learning procedure for artificial intelligence that analyzes the input data in detail.
DL has a popularity with the common improvements on the graphical processing unit capabilities.
Deep Extreme Learning machines (Deep ELM) is one of the fastest and effective way to meet fast classification problems.
arXiv Detail & Related papers (2021-01-21T08:22:18Z) - Gradient-only line searches to automatically determine learning rates
for a variety of stochastic training algorithms [0.0]
We study the application of the Gradient-Only Line Search that is Inexact (GOLS-I) to determine the learning rate schedule for a selection of popular neural network training algorithms.
GOLS-I's learning rate schedules are competitive with manually tuned learning rates, over seven optimization algorithms, three types of neural network architecture, 23 datasets and two loss functions.
arXiv Detail & Related papers (2020-06-29T08:59:31Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - Improving Learning Efficiency for Wireless Resource Allocation with
Symmetric Prior [28.275250620630466]
In this article, we first briefly summarize two classes of approaches to using domain knowledge: introducing mathematical models or prior knowledge to deep learning.
To explain how such a generic prior is harnessed to improve learning efficiency, we resort to ranking.
We find that the required training samples to achieve given system performance decreases with the number of subcarriers or contents.
arXiv Detail & Related papers (2020-05-18T07:57:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.