On margin-based generalization prediction in deep neural networks
- URL: http://arxiv.org/abs/2405.17445v1
- Date: Mon, 20 May 2024 13:26:57 GMT
- Title: On margin-based generalization prediction in deep neural networks
- Authors: Coenraad Mouton,
- Abstract summary: We investigate margin-based generalization prediction methods in different settings.
We find that different types of sample noise can have a very different effect on the overall margin of a network.
We introduce a new margin-based measure that incorporates an approximation of the underlying data manifold.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Understanding generalization in deep neural networks is an active area of research. A promising avenue of exploration has been that of margin measurements: the shortest distance to the decision boundary for a given sample or that sample's representation internal to the network. Margin-based complexity measures have been shown to be correlated with the generalization ability of deep neural networks in some circumstances but not others. The reasons behind the success or failure of these metrics are currently unclear. In this study, we examine margin-based generalization prediction methods in different settings. We motivate why these metrics sometimes fail to accurately predict generalization and how they can be improved. First, we analyze the relationship between margins measured in the input space and sample noise. We find that different types of sample noise can have a very different effect on the overall margin of a network that has modeled noisy data. Following this, we empirically evaluate how robust margins measured at different representational spaces are at predicting generalization. We find that these metrics have several limitations and that a large margin does not exhibit a strong correlation with empirical risk in many cases. Finally, we introduce a new margin-based measure that incorporates an approximation of the underlying data manifold. It is empirically demonstrated that this measure is generally more predictive of generalization than all other margin-based measures. Furthermore, we find that this measurement also outperforms other contemporary complexity measures on a well-known generalization prediction benchmark. In addition, we analyze the utility and limitations of this approach and find that this metric is well aligned with intuitions expressed in prior work.
Related papers
- Input margins can predict generalization too [1.2430809884830318]
We show that input margins are not generally predictive of generalization, they can be if the search space is appropriately constrained.
We develop such a measure based on input margins, which we refer to as constrained margins'
We find that constrained margins achieve highly competitive scores and outperform other margin measurements in general.
arXiv Detail & Related papers (2023-08-29T17:47:42Z) - The Missing Margin: How Sample Corruption Affects Distance to the
Boundary in ANNs [2.65558931169264]
We show that some types of training samples are modelled with consistently small margins while affecting generalization in different ways.
We support our findings with an analysis of fully-connected networks trained on noise-corrupted MNIST data, as well as convolutional networks trained on noise-corrupted CIFAR10 data.
arXiv Detail & Related papers (2023-02-14T09:25:50Z) - Bounding generalization error with input compression: An empirical study
with infinite-width networks [16.17600110257266]
Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data.
In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations.
An existing input compression-based GE bound is used to link MI and GE.
arXiv Detail & Related papers (2022-07-19T17:05:02Z) - Assaying Out-Of-Distribution Generalization in Transfer Learning [103.57862972967273]
We take a unified view of previous work, highlighting message discrepancies that we address empirically.
We fine-tune over 31k networks, from nine different architectures in the many- and few-shot setting.
arXiv Detail & Related papers (2022-07-19T12:52:33Z) - Local Evaluation of Time Series Anomaly Detection Algorithms [9.717823994163277]
We show that an adversary algorithm can reach high precision and recall on almost any dataset under weak assumption.
We propose a theoretically grounded, robust, parameter-free and interpretable extension to precision/recall metrics.
arXiv Detail & Related papers (2022-06-27T10:18:41Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Predicting Deep Neural Network Generalization with Perturbation Response
Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks.
Specifically, we introduce two new measures for accurately predicting generalization gaps.
We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z) - Measuring Generalization with Optimal Transport [111.29415509046886]
We develop margin-based generalization bounds, where the margins are normalized with optimal transport costs.
Our bounds robustly predict the generalization error, given training data and network parameters, on large scale datasets.
arXiv Detail & Related papers (2021-06-07T03:04:59Z) - Robustness to Pruning Predicts Generalization in Deep Neural Networks [29.660568281957072]
We introduce prunability: the smallest emphfraction of a network's parameters that can be kept while pruning without adversely affecting its training loss.
We show that this measure is highly predictive of a model's generalization performance across a large set of convolutional networks trained on CIFAR-10.
arXiv Detail & Related papers (2021-03-10T11:39:14Z) - In Search of Robust Measures of Generalization [79.75709926309703]
We develop bounds on generalization error, optimization error, and excess risk.
When evaluated empirically, most of these bounds are numerically vacuous.
We argue that generalization measures should instead be evaluated within the framework of distributional robustness.
arXiv Detail & Related papers (2020-10-22T17:54:25Z) - Generalized Sliced Distances for Probability Distributions [47.543990188697734]
We introduce a broad family of probability metrics, coined as Generalized Sliced Probability Metrics (GSPMs)
GSPMs are rooted in the generalized Radon transform and come with a unique geometric interpretation.
We consider GSPM-based gradient flows for generative modeling applications and show that under mild assumptions, the gradient flow converges to the global optimum.
arXiv Detail & Related papers (2020-02-28T04:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.