Stability for the Training of Deep Neural Networks and Other Classifiers
- URL: http://arxiv.org/abs/2002.04122v3
- Date: Thu, 1 Oct 2020 21:16:08 GMT
- Title: Stability for the Training of Deep Neural Networks and Other Classifiers
- Authors: Leonid Berlyand, Pierre-Emmanuel Jabin, C. Alex Safsten
- Abstract summary: We formalize the notion of stability, and provide examples of instability.
Our results do not depend on the algorithm used for training, as long as loss decreases with training.
- Score: 0.9558392439655015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We examine the stability of loss-minimizing training processes that are used
for deep neural networks (DNN) and other classifiers. While a classifier is
optimized during training through a so-called loss function, the performance of
classifiers is usually evaluated by some measure of accuracy, such as the
overall accuracy which quantifies the proportion of objects that are well
classified. This leads to the guiding question of stability: does decreasing
loss through training always result in increased accuracy? We formalize the
notion of stability, and provide examples of instability. Our main result
consists of two novel conditions on the classifier which, if either is
satisfied, ensure stability of training, that is we derive tight bounds on
accuracy as loss decreases. We also derive a sufficient condition for stability
on the training set alone, identifying flat portions of the data manifold as
potential sources of instability. The latter condition is explicitly verifiable
on the training dataset. Our results do not depend on the algorithm used for
training, as long as loss decreases with training.
Related papers
- Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Simplifying Neural Network Training Under Class Imbalance [77.39968702907817]
Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models.
The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures.
We demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, and label smoothing, can achieve state-of-the-art performance without any such specialized class imbalance methods.
arXiv Detail & Related papers (2023-12-05T05:52:44Z) - Measuring and Mitigating Local Instability in Deep Neural Networks [23.342675028217762]
We study how the predictions of a model change, even when it is retrained on the same data, as a consequence of principledity in the training process.
For Natural Language Understanding (NLU) tasks, we find instability in predictions for a significant fraction of queries.
We propose new data-centric methods that exploit our local stability estimates.
arXiv Detail & Related papers (2023-05-18T00:34:15Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - Confidence-aware Training of Smoothed Classifiers for Certified
Robustness [75.95332266383417]
We use "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input.
Our experiments show that the proposed method consistently exhibits improved certified robustness upon state-of-the-art training methods.
arXiv Detail & Related papers (2022-12-18T03:57:12Z) - Stability of Accuracy for the Training of DNNs Via the Uniform Doubling
Condition [0.0]
We study the stability of accuracy during the training of deep neural networks (DNNs)
The goal of achieving stability of accuracy is to ensure that if accuracy is high at some initial time, it remains high throughout training.
arXiv Detail & Related papers (2022-10-16T02:42:42Z) - Utilizing Class Separation Distance for the Evaluation of Corruption
Robustness of Machine Learning Classifiers [0.6882042556551611]
We propose a test data augmentation method that uses a robustness distance $epsilon$ derived from the datasets minimal class separation distance.
The resulting MSCR metric allows a dataset-specific comparison of different classifiers with respect to their corruption robustness.
Our results indicate that robustness training through simple data augmentation can already slightly improve accuracy.
arXiv Detail & Related papers (2022-06-27T15:56:16Z) - Training ReLU networks to high uniform accuracy is intractable [7.723983475351976]
We quantify the number of training samples needed for any conceivable training algorithm to guarantee a given uniform accuracy.
We conclude that the training of ReLU neural networks to high uniform accuracy is intractable.
arXiv Detail & Related papers (2022-05-26T17:50:55Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Gradient Descent on Neural Networks Typically Occurs at the Edge of
Stability [94.4070247697549]
Full-batch gradient descent on neural network training objectives operates in a regime we call the Edge of Stability.
In this regime, the maximum eigenvalue of the training loss Hessian hovers just above the numerical value $2 / text(step size)$, and the training loss behaves non-monotonically over short timescales, yet consistently decreases over long timescales.
arXiv Detail & Related papers (2021-02-26T22:08:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.