Predicting Unreliable Predictions by Shattering a Neural Network
- URL: http://arxiv.org/abs/2106.08365v1
- Date: Tue, 15 Jun 2021 18:34:41 GMT
- Title: Predicting Unreliable Predictions by Shattering a Neural Network
- Authors: Xu Ji, Razvan Pascanu, Devon Hjelm, Andrea Vedaldi, Balaji
Lakshminarayanan, Yoshua Bengio
- Abstract summary: Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
- Score: 145.3823991041987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Piecewise linear neural networks can be split into subfunctions, each with
its own activation pattern, domain, and empirical error. Empirical error for
the full network can be written as an expectation over empirical error of
subfunctions. Constructing a generalization bound on subfunction empirical
error indicates that the more densely a subfunction is surrounded by training
samples in representation space, the more reliable its predictions are.
Further, it suggests that models with fewer activation regions generalize
better, and models that abstract knowledge to a greater degree generalize
better, all else equal. We propose not only a theoretical framework to reason
about subfunction error bounds but also a pragmatic way of approximately
evaluating it, which we apply to predicting which samples the network will not
successfully generalize to. We test our method on detection of
misclassification and out-of-distribution samples, finding that it performs
competitively in both cases. In short, some network activation patterns are
associated with higher reliability than others, and these can be identified
using subfunction error bounds.
Related papers
- Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples
using Gradients and Invariance Transformations [77.34726150561087]
We propose a holistic approach for the detection of generalization errors in deep neural networks.
GIT combines the usage of gradient information and invariance transformations.
Our experiments demonstrate the superior performance of GIT compared to the state-of-the-art on a variety of network architectures.
arXiv Detail & Related papers (2023-07-05T22:04:38Z) - Do highly over-parameterized neural networks generalize since bad
solutions are rare? [0.0]
Empirical Risk Minimization (ERM) for learning leads to zero training error.
We show that under certain conditions the fraction of "bad" global minima with a true error larger than epsilon decays to zero exponentially fast with the number of training data n.
arXiv Detail & Related papers (2022-11-07T14:02:07Z) - Learning Distributions by Generative Adversarial Networks: Approximation
and Generalization [0.6768558752130311]
We study how well generative adversarial networks learn from finite samples by analyzing the convergence rates of these models.
Our analysis is based on a new inequality oracle that decomposes the estimation error of GAN into the discriminator and generator approximation errors.
For generator approximation error, we show that neural network can approximately transform a low-dimensional source distribution to a high-dimensional target distribution.
arXiv Detail & Related papers (2022-05-25T09:26:17Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Generalization and Memorization: The Bias Potential Model [9.975163460952045]
generative models and density estimators behave quite differently from models for learning functions.
For the bias potential model, we show that dimension-independent generalization accuracy is achievable if early stopping is adopted.
In the long term, the model either memorizes the samples or diverges.
arXiv Detail & Related papers (2020-11-29T04:04:54Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.