Related papers: Tighter risk certificates for neural networks

Tighter risk certificates for neural networks

URL: http://arxiv.org/abs/2007.12911v3
Date: Wed, 22 Sep 2021 14:27:19 GMT
Title: Tighter risk certificates for neural networks
Authors: Mar\'ia P\'erez-Ortiz and Omar Rivasplata and John Shawe-Taylor and Csaba Szepesv\'ari
Abstract summary: We present two training objectives, used here for the first time in connection with training neural networks. We also re-implement a previously used training objective based on a classical PAC-Bayes bound. We compute risk certificates for the learnt predictors, based on part of the data used to learn the predictors.
Score: 10.462889461373226
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. In the context of probabilistic neural networks, the output of training is a probability distribution over network weights. We present two training objectives, used here for the first time in connection with training neural networks. These two training objectives are derived from tight PAC-Bayes bounds. We also re-implement a previously used training objective based on a classical PAC-Bayes bound, to compare the properties of the predictors learned using the different training objectives. We compute risk certificates for the learnt predictors, based on part of the data used to learn the predictors. We further experiment with different types of priors on the weights (both data-free and data-dependent priors) and neural network architectures. Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature, showing promise not only to guide the learning algorithm through bounding the risk but also for model selection. These observations suggest that the methods studied here might be good candidates for self-certified learning, in the sense of using the whole data set for learning a predictor and certifying its risk on any unseen data (from the same distribution as the training data) potentially without the need for holding out test data.

Related papers

Training Guarantees of Neural Network Classification Two-Sample Tests by Kernel Analysis [58.435336033383145]
We construct and analyze a neural network two-sample test to determine whether two datasets came from the same distribution. We derive the theoretical minimum training time needed to ensure the NTK two-sample test detects a deviation-level between the datasets. We show that the statistical power associated with the neural network two-sample test goes to 1 as the neural network training samples and test evaluation samples go to infinity.
arXiv Detail & Related papers (2024-07-05T18:41:16Z)
Uncertainty Quantification for Deep Learning [0.0]
A complete and statistically consistent uncertainty quantification for deep learning is provided. We demonstrate how each uncertainty source can be systematically quantified. We also introduce a fast and practical way to incorporate and combine all sources of errors for the first time.
arXiv Detail & Related papers (2024-05-31T00:20:19Z)
Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z)
Fundamental limits of overparametrized shallow neural networks for supervised learning [11.136777922498355]
We study a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture. Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error.
arXiv Detail & Related papers (2023-07-11T08:30:50Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
DCLP: Neural Architecture Predictor with Curriculum Contrastive Learning [5.2319020651074215]
We propose a Curricumum-guided Contrastive Learning framework for neural Predictor (DCLP) Our method simplifies the contrastive task by designing a novel curriculum to enhance the stability of unlabeled training data distribution. We experimentally demonstrate that DCLP has high accuracy and efficiency compared with existing predictors.
arXiv Detail & Related papers (2023-02-25T08:16:21Z)
Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next. In such settings, there is a distinct type of distribution shift between the training and test data. We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z)
Progress in Self-Certified Neural Networks [13.434562713466246]
A learning method is self-certified if it uses all available data to simultaneously learn a predictor and certify its quality. Recent work has shown that neural network models trained by optimising PAC-Bayes bounds lead to accurate predictors. We show that in data starvation regimes, holding out data for the test set bounds adversely affects generalisation performance.
arXiv Detail & Related papers (2021-11-15T13:39:44Z)
Learning PAC-Bayes Priors for Probabilistic Neural Networks [32.01506699213665]
Recent works have investigated deep learning models trained by optimising PAC-Bayes bounds, with priors that are learnt on subsets of the data. We ask what is the optimal amount of data which should be allocated for building the prior and show that the optimum may be dataset dependent.
arXiv Detail & Related papers (2021-09-21T16:27:42Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
Statistical model-based evaluation of neural networks [74.10854783437351]
We develop an experimental setup for the evaluation of neural networks (NNs) The setup helps to benchmark a set of NNs vis-a-vis minimum-mean-square-error (MMSE) performance bounds. This allows us to test the effects of training data size, data dimension, data geometry, noise, and mismatch between training and testing conditions.
arXiv Detail & Related papers (2020-11-18T00:33:24Z)
Learning from Failure: Training Debiased Classifier from Biased Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge. We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously. Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.