Pre-interpolation loss behaviour in neural networks
- URL: http://arxiv.org/abs/2103.07986v1
- Date: Sun, 14 Mar 2021 18:08:59 GMT
- Title: Pre-interpolation loss behaviour in neural networks
- Authors: Arthur E. W. Venter and Marthinus W. Theunissen and Marelie H. Davel
- Abstract summary: We show that test loss does not increase overall, but only for a small minority of samples.
This effect seems to be mainly caused by increased parameter values relating to the correctly processed sample features.
Our findings contribute to the practical understanding of a common behaviour of deep neural networks.
- Score: 3.8716601453641886
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: When training neural networks as classifiers, it is common to observe an
increase in average test loss while still maintaining or improving the overall
classification accuracy on the same dataset. In spite of the ubiquity of this
phenomenon, it has not been well studied and is often dismissively attributed
to an increase in borderline correct classifications. We present an empirical
investigation that shows how this phenomenon is actually a result of the
differential manner by which test samples are processed. In essence: test loss
does not increase overall, but only for a small minority of samples. Large
representational capacities allow losses to decrease for the vast majority of
test samples at the cost of extreme increases for others. This effect seems to
be mainly caused by increased parameter values relating to the correctly
processed sample features. Our findings contribute to the practical
understanding of a common behaviour of deep neural networks. We also discuss
the implications of this work for network optimisation and generalisation.
Related papers
- Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Uniform Convergence, Adversarial Spheres and a Simple Remedy [40.44709296304123]
Previous work has cast doubt on the general framework of uniform convergence and its ability to explain generalization in neural networks.
We provide an extensive theoretical investigation of the previously studied data setting through the lens of infinitely-wide models.
We prove that the Neural Tangent Kernel (NTK) also suffers from the same phenomenon and we uncover its origin.
arXiv Detail & Related papers (2021-05-07T20:23:01Z) - Vulnerability Under Adversarial Machine Learning: Bias or Variance? [77.30759061082085]
We investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network.
Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation.
We introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies.
arXiv Detail & Related papers (2020-08-01T00:58:54Z) - Regularizing Class-wise Predictions via Self-knowledge Distillation [80.76254453115766]
We propose a new regularization method that penalizes the predictive distribution between similar samples.
This results in regularizing the dark knowledge (i.e., the knowledge on wrong predictions) of a single network.
Our experimental results on various image classification tasks demonstrate that the simple yet powerful method can significantly improve the generalization ability.
arXiv Detail & Related papers (2020-03-31T06:03:51Z) - AL2: Progressive Activation Loss for Learning General Representations in
Classification Neural Networks [12.14537824884951]
We propose a novel regularization method that progressively penalizes the magnitude of activations during training.
Our method's effect on generalization is analyzed with label randomization tests and cumulative ablations.
arXiv Detail & Related papers (2020-03-07T18:38:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.