Robustness to Pruning Predicts Generalization in Deep Neural Networks
- URL: http://arxiv.org/abs/2103.06002v1
- Date: Wed, 10 Mar 2021 11:39:14 GMT
- Title: Robustness to Pruning Predicts Generalization in Deep Neural Networks
- Authors: Lorenz Kuhn, Clare Lyle, Aidan N. Gomez, Jonas Rothfuss, Yarin Gal
- Abstract summary: We introduce prunability: the smallest emphfraction of a network's parameters that can be kept while pruning without adversely affecting its training loss.
We show that this measure is highly predictive of a model's generalization performance across a large set of convolutional networks trained on CIFAR-10.
- Score: 29.660568281957072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing generalization measures that aim to capture a model's simplicity
based on parameter counts or norms fail to explain generalization in
overparameterized deep neural networks. In this paper, we introduce a new,
theoretically motivated measure of a network's simplicity which we call
prunability: the smallest \emph{fraction} of the network's parameters that can
be kept while pruning without adversely affecting its training loss. We show
that this measure is highly predictive of a model's generalization performance
across a large set of convolutional networks trained on CIFAR-10, does not grow
with network size unlike existing pruning-based measures, and exhibits high
correlation with test set loss even in a particularly challenging double
descent setting. Lastly, we show that the success of prunability cannot be
explained by its relation to known complexity measures based on models' margin,
flatness of minima and optimization speed, finding that our new measure is
similar to -- but more predictive than -- existing flatness-based measures, and
that its predictions exhibit low mutual information with those of other
baselines.
Related papers
- Network reconstruction via the minimum description length principle [0.0]
We propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization.
Our approach follows the minimum description length (MDL) principle, and uncovers the weight distribution that allows for the most compression of the data.
We demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks.
arXiv Detail & Related papers (2024-05-02T05:35:09Z) - Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - Sparsity-aware generalization theory for deep neural networks [12.525959293825318]
We present a new approach to analyzing generalization for deep feed-forward ReLU networks.
We show fundamental trade-offs between sparsity and generalization.
arXiv Detail & Related papers (2023-07-01T20:59:05Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Robust Neural Regression via Uncertainty Learning [5.654198773446211]
Deep neural networks tend to underestimate uncertainty and produce overly confident predictions.
We propose a simple solution by extending the time-tested iterative reweighted least square (IRLS) in generalised linear regression.
We use two sub-networks to parametrise the prediction and uncertainty estimation, enabling easy handling of complex inputs and nonlinear response.
arXiv Detail & Related papers (2021-10-12T23:19:13Z) - Mitigating Performance Saturation in Neural Marked Point Processes:
Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers.
We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z) - Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks.
This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network.
Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z) - Predicting Deep Neural Network Generalization with Perturbation Response
Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks.
Specifically, we introduce two new measures for accurately predicting generalization gaps.
We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z) - Lost in Pruning: The Effects of Pruning Neural Networks beyond Test
Accuracy [42.15969584135412]
Neural network pruning is a popular technique used to reduce the inference costs of modern networks.
We evaluate whether the use of test accuracy alone in the terminating condition is sufficient to ensure that the resulting model performs well.
We find that pruned networks effectively approximate the unpruned model, however, the prune ratio at which pruned networks achieve commensurate performance varies significantly across tasks.
arXiv Detail & Related papers (2021-03-04T13:22:16Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.