Stability of Accuracy for the Training of DNNs Via the Uniform Doubling
Condition
- URL: http://arxiv.org/abs/2210.08415v3
- Date: Sun, 24 Dec 2023 23:50:55 GMT
- Title: Stability of Accuracy for the Training of DNNs Via the Uniform Doubling
Condition
- Authors: Yitzchak Shmalo
- Abstract summary: We study the stability of accuracy during the training of deep neural networks (DNNs)
The goal of achieving stability of accuracy is to ensure that if accuracy is high at some initial time, it remains high throughout training.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the stability of accuracy during the training of deep neural
networks (DNNs). In this context, the training of a DNN is performed via the
minimization of a cross-entropy loss function, and the performance metric is
accuracy (the proportion of objects that are classified correctly). While
training results in a decrease of loss, the accuracy does not necessarily
increase during the process and may sometimes even decrease. The goal of
achieving stability of accuracy is to ensure that if accuracy is high at some
initial time, it remains high throughout training.
A recent result by Berlyand, Jabin, and Safsten introduces a doubling
condition on the training data, which ensures the stability of accuracy during
training for DNNs using the absolute value activation function. For training
data in $\mathbb{R}^n$, this doubling condition is formulated using slabs in
$\mathbb{R}^n$ and depends on the choice of the slabs. The goal of this paper
is twofold. First, to make the doubling condition uniform, that is, independent
of the choice of slabs. This leads to sufficient conditions for stability in
terms of training data only. In other words, for a training set $T$ that
satisfies the uniform doubling condition, there exists a family of DNNs such
that a DNN from this family with high accuracy on the training set at some
training time $t_0$ will have high accuracy for all time $t>t_0$. Moreover,
establishing uniformity is necessary for the numerical implementation of the
doubling condition.
The second goal is to extend the original stability results from the absolute
value activation function to a broader class of piecewise linear activation
functions with finitely many critical points, such as the popular Leaky ReLU.
Related papers
- Curse of Dimensionality in Neural Network Optimization [6.460951804337735]
curse of dimensionality in neural network optimization under the mean-field regime is studied.
It is established that the curse of dimensionality persists when a locally Lipschitz continuous activation function is employed.
arXiv Detail & Related papers (2025-02-07T22:21:31Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss.
We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - Improved techniques for deterministic l2 robustness [63.34032156196848]
Training convolutional neural networks (CNNs) with a strict 1-Lipschitz constraint under the $l_2$ norm is useful for adversarial robustness, interpretable gradients and stable training.
We introduce a procedure to certify robustness of 1-Lipschitz CNNs by replacing the last linear layer with a 1-hidden layer.
We significantly advance the state-of-the-art for standard and provable robust accuracies on CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2022-11-15T19:10:12Z) - Stability Analysis and Generalization Bounds of Adversarial Training [31.50956388020211]
In adversarial machine learning, deep neural networks can fit the adversarial examples on the training dataset but have poor generalization on the test set.
This phenomenon is called robust overfitting, and it can be observed when adversarially training neural nets on common datasets.
arXiv Detail & Related papers (2022-10-03T14:21:46Z) - Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of
Stability [40.17821914923602]
We show that gradient descent at the edge of stability implicitly follows projected gradient descent (PGD) under the constraint $S(theta) le 2/eta$.
Our analysis provides precise predictions for the loss, sharpness, and deviation from the PGD trajectory throughout training.
arXiv Detail & Related papers (2022-09-30T17:15:12Z) - Training Neural Networks in Single vs Double Precision [8.036150169408241]
Conjugate Gradient and RMSprop algorithms are optimized for mean square error.
Experiments show that single-precision can keep up with double-precision if line search finds an improvement.
For strongly nonlinear tasks, both algorithm classes find only solutions fairly poor in terms of mean square error.
arXiv Detail & Related papers (2022-09-15T11:20:53Z) - Memorize to Generalize: on the Necessity of Interpolation in High
Dimensional Linear Regression [6.594338220264161]
achieve optimal predictive risk in machine learning problems requires interpolating the training data.
We characterize how prediction (test) error necessarily scales with training error in this setting.
optimal performance requires fitting training data to substantially higher accuracy than the inherent noise floor of the problem.
arXiv Detail & Related papers (2022-02-20T18:51:45Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z) - Stability for the Training of Deep Neural Networks and Other Classifiers [0.9558392439655015]
We formalize the notion of stability, and provide examples of instability.
Our results do not depend on the algorithm used for training, as long as loss decreases with training.
arXiv Detail & Related papers (2020-02-10T22:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.