Related papers: Predicting Unreliable Predictions by Shattering a Neural Network

Predicting Unreliable Predictions by Shattering a Neural Network

URL: http://arxiv.org/abs/2106.08365v1
Date: Tue, 15 Jun 2021 18:34:41 GMT
Title: Predicting Unreliable Predictions by Shattering a Neural Network
Authors: Xu Ji, Razvan Pascanu, Devon Hjelm, Andrea Vedaldi, Balaji Lakshminarayanan, Yoshua Bengio
Abstract summary: Piecewise linear neural networks can be split into subfunctions. Subfunctions have their own activation pattern, domain, and empirical error. Empirical error for the full network can be written as an expectation over subfunctions.
Score: 145.3823991041987
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Piecewise linear neural networks can be split into subfunctions, each with its own activation pattern, domain, and empirical error. Empirical error for the full network can be written as an expectation over empirical error of subfunctions. Constructing a generalization bound on subfunction empirical error indicates that the more densely a subfunction is surrounded by training samples in representation space, the more reliable its predictions are. Further, it suggests that models with fewer activation regions generalize better, and models that abstract knowledge to a greater degree generalize better, all else equal. We propose not only a theoretical framework to reason about subfunction error bounds but also a pragmatic way of approximately evaluating it, which we apply to predicting which samples the network will not successfully generalize to. We test our method on detection of misclassification and out-of-distribution samples, finding that it performs competitively in both cases. In short, some network activation patterns are associated with higher reliability than others, and these can be identified using subfunction error bounds.

Related papers

A Classical View on Benign Overfitting: The Role of Sample Size [14.36840959836957]
We focus on almost benign overfitting, where models simultaneously achieve both arbitrarily small training and test errors.<n>This behavior is characteristic of neural networks, which often achieve low (but non-zero) training error while still generalizing well.
arXiv Detail & Related papers (2025-05-16T18:37:51Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples using Gradients and Invariance Transformations [77.34726150561087]
We propose a holistic approach for the detection of generalization errors in deep neural networks. GIT combines the usage of gradient information and invariance transformations. Our experiments demonstrate the superior performance of GIT compared to the state-of-the-art on a variety of network architectures.
arXiv Detail & Related papers (2023-07-05T22:04:38Z)
Do highly over-parameterized neural networks generalize since bad solutions are rare? [0.0]
Empirical Risk Minimization (ERM) for learning leads to zero training error. We show that under certain conditions the fraction of "bad" global minima with a true error larger than epsilon decays to zero exponentially fast with the number of training data n.
arXiv Detail & Related papers (2022-11-07T14:02:07Z)
Learning Distributions by Generative Adversarial Networks: Approximation and Generalization [0.6768558752130311]
We study how well generative adversarial networks learn from finite samples by analyzing the convergence rates of these models. Our analysis is based on a new inequality oracle that decomposes the estimation error of GAN into the discriminator and generator approximation errors. For generator approximation error, we show that neural network can approximately transform a low-dimensional source distribution to a high-dimensional target distribution.
arXiv Detail & Related papers (2022-05-25T09:26:17Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
Generalization and Memorization: The Bias Potential Model [9.975163460952045]
generative models and density estimators behave quite differently from models for learning functions. For the bias potential model, we show that dimension-independent generalization accuracy is achievable if early stopping is adopted. In the long term, the model either memorizes the samples or diverges.
arXiv Detail & Related papers (2020-11-29T04:04:54Z)
Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.