Discriminative calibration: Check Bayesian computation from simulations
and flexible classifier
- URL: http://arxiv.org/abs/2305.14593v2
- Date: Fri, 27 Oct 2023 21:55:39 GMT
- Title: Discriminative calibration: Check Bayesian computation from simulations
and flexible classifier
- Authors: Yuling Yao, Justin Domke
- Abstract summary: We propose to replace the marginal rank test with a flexible classification approach that learns test statistics from data.
We illustrate an automated implementation using neural networks and statistically-inspired features, and validate the method with numerical and real data experiments.
- Score: 23.91355980551754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To check the accuracy of Bayesian computations, it is common to use
rank-based simulation-based calibration (SBC). However, SBC has drawbacks: The
test statistic is somewhat ad-hoc, interactions are difficult to examine,
multiple testing is a challenge, and the resulting p-value is not a divergence
metric. We propose to replace the marginal rank test with a flexible
classification approach that learns test statistics from data. This measure
typically has a higher statistical power than the SBC rank test and returns an
interpretable divergence measure of miscalibration, computed from
classification accuracy. This approach can be used with different data
generating processes to address likelihood-free inference or traditional
inference methods like Markov chain Monte Carlo or variational inference. We
illustrate an automated implementation using neural networks and
statistically-inspired features, and validate the method with numerical and
real data experiments.
Related papers
- Modelling Sampling Distributions of Test Statistics with Autograd [0.0]
We explore whether this approach to modeling conditional 1-dimensional sampling distributions is a viable alternative to the probability density-ratio method.
Relatively simple, yet effective, neural network models are used whose predictive uncertainty is quantified through a variety of methods.
arXiv Detail & Related papers (2024-05-03T21:34:12Z) - Is K-fold cross validation the best model selection method for Machine
Learning? [0.0]
K-fold cross-validation is the most common approach to ascertaining the likelihood that a machine learning outcome is generated by chance.
A novel test based on K-fold CV and the Upper Bound of the actual error (K-fold CUBV) is composed.
arXiv Detail & Related papers (2024-01-29T18:46:53Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Calibrating Neural Simulation-Based Inference with Differentiable
Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model.
By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation.
It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z) - Model-agnostic out-of-distribution detection using combined statistical
tests [15.27980070479021]
We present simple methods for out-of-distribution detection using a trained generative model.
We combine a classical parametric test (Rao's score test) with the recently introduced typicality test.
Despite their simplicity and generality, these methods can be competitive with model-specific out-of-distribution detection algorithms.
arXiv Detail & Related papers (2022-03-02T13:32:09Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Approximate Bayesian Computation via Classification [0.966840768820136]
Approximate Computation (ABC) enables statistical inference in complex models whose likelihoods are difficult to calculate but easy to simulate from.
ABC constructs a kernel-type approximation to the posterior distribution through an accept/reject mechanism which compares summary statistics of real and simulated data.
We consider the traditional accept/reject kernel as well as an exponential weighting scheme which does not require the ABC acceptance threshold.
arXiv Detail & Related papers (2021-11-22T20:07:55Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.