Tighter risk certificates for neural networks
- URL: http://arxiv.org/abs/2007.12911v3
- Date: Wed, 22 Sep 2021 14:27:19 GMT
- Title: Tighter risk certificates for neural networks
- Authors: Mar\'ia P\'erez-Ortiz and Omar Rivasplata and John Shawe-Taylor and
Csaba Szepesv\'ari
- Abstract summary: We present two training objectives, used here for the first time in connection with training neural networks.
We also re-implement a previously used training objective based on a classical PAC-Bayes bound.
We compute risk certificates for the learnt predictors, based on part of the data used to learn the predictors.
- Score: 10.462889461373226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents an empirical study regarding training probabilistic
neural networks using training objectives derived from PAC-Bayes bounds. In the
context of probabilistic neural networks, the output of training is a
probability distribution over network weights. We present two training
objectives, used here for the first time in connection with training neural
networks. These two training objectives are derived from tight PAC-Bayes
bounds. We also re-implement a previously used training objective based on a
classical PAC-Bayes bound, to compare the properties of the predictors learned
using the different training objectives. We compute risk certificates for the
learnt predictors, based on part of the data used to learn the predictors. We
further experiment with different types of priors on the weights (both
data-free and data-dependent priors) and neural network architectures. Our
experiments on MNIST and CIFAR-10 show that our training methods produce
competitive test set errors and non-vacuous risk bounds with much tighter
values than previous results in the literature, showing promise not only to
guide the learning algorithm through bounding the risk but also for model
selection. These observations suggest that the methods studied here might be
good candidates for self-certified learning, in the sense of using the whole
data set for learning a predictor and certifying its risk on any unseen data
(from the same distribution as the training data) potentially without the need
for holding out test data.
Related papers
- Uncertainty Quantification for Deep Learning [0.0]
A complete and statistically consistent uncertainty quantification for deep learning is provided.
We demonstrate how each uncertainty source can be systematically quantified.
We also introduce a fast and practical way to incorporate and combine all sources of errors for the first time.
arXiv Detail & Related papers (2024-05-31T00:20:19Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Fundamental limits of overparametrized shallow neural networks for
supervised learning [11.136777922498355]
We study a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture.
Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error.
arXiv Detail & Related papers (2023-07-11T08:30:50Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - DCLP: Neural Architecture Predictor with Curriculum Contrastive Learning [5.2319020651074215]
We propose a Curricumum-guided Contrastive Learning framework for neural Predictor (DCLP)
Our method simplifies the contrastive task by designing a novel curriculum to enhance the stability of unlabeled training data distribution.
We experimentally demonstrate that DCLP has high accuracy and efficiency compared with existing predictors.
arXiv Detail & Related papers (2023-02-25T08:16:21Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Progress in Self-Certified Neural Networks [13.434562713466246]
A learning method is self-certified if it uses all available data to simultaneously learn a predictor and certify its quality.
Recent work has shown that neural network models trained by optimising PAC-Bayes bounds lead to accurate predictors.
We show that in data starvation regimes, holding out data for the test set bounds adversely affects generalisation performance.
arXiv Detail & Related papers (2021-11-15T13:39:44Z) - Learning PAC-Bayes Priors for Probabilistic Neural Networks [32.01506699213665]
Recent works have investigated deep learning models trained by optimising PAC-Bayes bounds, with priors that are learnt on subsets of the data.
We ask what is the optimal amount of data which should be allocated for building the prior and show that the optimum may be dataset dependent.
arXiv Detail & Related papers (2021-09-21T16:27:42Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Statistical model-based evaluation of neural networks [74.10854783437351]
We develop an experimental setup for the evaluation of neural networks (NNs)
The setup helps to benchmark a set of NNs vis-a-vis minimum-mean-square-error (MMSE) performance bounds.
This allows us to test the effects of training data size, data dimension, data geometry, noise, and mismatch between training and testing conditions.
arXiv Detail & Related papers (2020-11-18T00:33:24Z) - Learning from Failure: Training Debiased Classifier from Biased
Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge.
We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously.
Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.