Learning PAC-Bayes Priors for Probabilistic Neural Networks
- URL: http://arxiv.org/abs/2109.10304v1
- Date: Tue, 21 Sep 2021 16:27:42 GMT
- Title: Learning PAC-Bayes Priors for Probabilistic Neural Networks
- Authors: Maria Perez-Ortiz and Omar Rivasplata and Benjamin Guedj and Matthew
Gleeson and Jingyu Zhang and John Shawe-Taylor and Miroslaw Bober and Josef
Kittler
- Abstract summary: Recent works have investigated deep learning models trained by optimising PAC-Bayes bounds, with priors that are learnt on subsets of the data.
We ask what is the optimal amount of data which should be allocated for building the prior and show that the optimum may be dataset dependent.
- Score: 32.01506699213665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent works have investigated deep learning models trained by optimising
PAC-Bayes bounds, with priors that are learnt on subsets of the data. This
combination has been shown to lead not only to accurate classifiers, but also
to remarkably tight risk certificates, bearing promise towards self-certified
learning (i.e. use all the data to learn a predictor and certify its quality).
In this work, we empirically investigate the role of the prior. We experiment
on 6 datasets with different strategies and amounts of data to learn
data-dependent PAC-Bayes priors, and we compare them in terms of their effect
on test performance of the learnt predictors and tightness of their risk
certificate. We ask what is the optimal amount of data which should be
allocated for building the prior and show that the optimum may be dataset
dependent. We demonstrate that using a small percentage of the prior-building
data for validation of the prior leads to promising results. We include a
comparison of underparameterised and overparameterised models, along with an
empirical study of different training objectives and regularisation strategies
to learn the prior distribution.
Related papers
- The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes [30.30769701138665]
We introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data.
Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem.
We introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point.
arXiv Detail & Related papers (2024-02-14T03:43:05Z) - A Supervised Contrastive Learning Pretrain-Finetune Approach for Time
Series [15.218841180577135]
We introduce a novel pretraining procedure that leverages supervised contrastive learning to distinguish features within each pretraining dataset.
We then propose a fine-tuning procedure designed to enhance the accurate prediction of the target data by aligning it more closely with the learned dynamics of the pretraining datasets.
arXiv Detail & Related papers (2023-11-21T02:06:52Z) - An Analysis of Initial Training Strategies for Exemplar-Free
Class-Incremental Learning [36.619804184427245]
Class-Incremental Learning (CIL) aims to build classification models from data streams.
Due to catastrophic forgetting, CIL is particularly challenging when examples from past classes cannot be stored.
Use of models pre-trained in a self-supervised way on large amounts of data has recently gained momentum.
arXiv Detail & Related papers (2023-08-22T14:06:40Z) - Diverse Data Augmentation with Diffusions for Effective Test-time Prompt
Tuning [73.75282761503581]
We propose DiffTPT, which leverages pre-trained diffusion models to generate diverse and informative new data.
Our experiments on test datasets with distribution shifts and unseen categories demonstrate that DiffTPT improves the zero-shot accuracy by an average of 5.13%.
arXiv Detail & Related papers (2023-08-11T09:36:31Z) - A Pretrainer's Guide to Training Data: Measuring the Effects of Data
Age, Domain Coverage, Quality, & Toxicity [84.6421260559093]
This study is the largest set of experiments to validate, quantify, and expose undocumented intuitions about text pretraining.
Our findings indicate there does not exist a one-size-fits-all solution to filtering training data.
arXiv Detail & Related papers (2023-05-22T15:57:53Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Progress in Self-Certified Neural Networks [13.434562713466246]
A learning method is self-certified if it uses all available data to simultaneously learn a predictor and certify its quality.
Recent work has shown that neural network models trained by optimising PAC-Bayes bounds lead to accurate predictors.
We show that in data starvation regimes, holding out data for the test set bounds adversely affects generalisation performance.
arXiv Detail & Related papers (2021-11-15T13:39:44Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z) - Tighter risk certificates for neural networks [10.462889461373226]
We present two training objectives, used here for the first time in connection with training neural networks.
We also re-implement a previously used training objective based on a classical PAC-Bayes bound.
We compute risk certificates for the learnt predictors, based on part of the data used to learn the predictors.
arXiv Detail & Related papers (2020-07-25T11:02:16Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.