Statistically Significant Stopping of Neural Network Training
- URL: http://arxiv.org/abs/2103.01205v1
- Date: Mon, 1 Mar 2021 18:51:16 GMT
- Title: Statistically Significant Stopping of Neural Network Training
- Authors: Justin K. Terry, Mario Jayakumar, Kusal De Alwis
- Abstract summary: We introduce a statistical significance test to determine if a neural network has stopped learning.
We use this as the basis of a new learning rate scheduler.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The general approach taken when training deep learning classifiers is to save
the parameters after every few iterations, train until either a human observer
or a simple metric-based heuristic decides the network isn't learning anymore,
and then backtrack and pick the saved parameters with the best validation
accuracy. Simple methods are used to determine if a neural network isn't
learning anymore because, as long as it's well after the optimal values are
found, the condition doesn't impact the final accuracy of the model. However
from a runtime perspective, this is of great significance to the many cases
where numerous neural networks are trained simultaneously (e.g. hyper-parameter
tuning). Motivated by this, we introduce a statistical significance test to
determine if a neural network has stopped learning. This stopping criterion
appears to represent a happy medium compared to other popular stopping
criterions, achieving comparable accuracy to the criterions that achieve the
highest final accuracies in 77% or fewer epochs, while the criterions which
stop sooner do so with an appreciable loss to final accuracy. Additionally, we
use this as the basis of a new learning rate scheduler, removing the need to
manually choose learning rate schedules and acting as a quasi-line search,
achieving superior or comparable empirical performance to existing methods.
Related papers
- Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - The Unreasonable Effectiveness Of Early Discarding After One Epoch In Neural Network Hyperparameter Optimization [10.93405937763835]
We study the trade-off between the aggressiveness of discarding and the loss of predictive performance.
We call this approach i-Epoch (i being the constant number of epochs with which neural networks are trained) and suggest to assess the quality of early discarding techniques.
arXiv Detail & Related papers (2024-04-05T14:08:57Z) - Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free
Ensembles of DNNs [9.010643838773477]
We introduce a novel score for quantifying overfit, which monitors the forgetting rate of deep models on validation data.
We show that overfit can occur with and without a decrease in validation accuracy, and may be more common than previously appreciated.
We use our observations to construct a new ensemble method, based solely on the training history of a single network, which provides significant improvement without any additional cost in training time.
arXiv Detail & Related papers (2023-10-17T09:22:22Z) - Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for
XOR Data [24.86314525762012]
We show that ReLU CNN trained by gradient descent can achieve near Bayes-optimal accuracy.
Our result demonstrates that CNNs have a remarkable capacity to efficiently learn XOR problems, even in the presence of highly correlated features.
arXiv Detail & Related papers (2023-10-03T11:31:37Z) - How does unlabeled data improve generalization in self-training? A
one-hidden-layer theoretical analysis [93.37576644429578]
This work establishes the first theoretical analysis for the known iterative self-training paradigm.
We prove the benefits of unlabeled data in both training convergence and generalization ability.
Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
arXiv Detail & Related papers (2022-01-21T02:16:52Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Incremental Deep Neural Network Learning using Classification Confidence
Thresholding [4.061135251278187]
Most modern neural networks for classification fail to take into account the concept of the unknown.
This paper proposes the Classification Confidence Threshold approach to prime neural networks for incremental learning.
arXiv Detail & Related papers (2021-06-21T22:46:28Z) - A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via
Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy.
Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z) - How much progress have we made in neural network training? A New
Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency.
A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search.
We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z) - Deep Learning and Statistical Models for Time-Critical Pedestrian
Behaviour Prediction [5.593571255686115]
We show that, though the neural network model achieves an accuracy of 80%, it requires long sequences to achieve this (100 samples or more)
The SLDS, has a lower accuracy of 74%, but it achieves this result with short sequences (10 samples)
The results provide a key intuition of the suitability of the models in time-critical problems.
arXiv Detail & Related papers (2020-02-26T00:05:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.