Bias and Priors in Machine Learning Calibrations for High Energy Physics
- URL: http://arxiv.org/abs/2205.05084v1
- Date: Tue, 10 May 2022 18:00:00 GMT
- Title: Bias and Priors in Machine Learning Calibrations for High Energy Physics
- Authors: Rikab Gambhir, Benjamin Nachman, and Jesse Thaler
- Abstract summary: We highlight the prior dependence of some machine learning-based calibration strategies.
Recent proposals for both simulation-based and data-based calibrations inherit properties of the sample used for training.
In the case of simulation-based calibration, we argue that our recently proposed Gaussian Ansatz approach can avoid some of the pitfalls of prior dependence.
- Score: 1.5675763601034223
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning offers an exciting opportunity to improve the calibration of
nearly all reconstructed objects in high-energy physics detectors. However,
machine learning approaches often depend on the spectra of examples used during
training, an issue known as prior dependence. This is an undesirable property
of a calibration, which needs to be applicable in a variety of environments.
The purpose of this paper is to explicitly highlight the prior dependence of
some machine learning-based calibration strategies. We demonstrate how some
recent proposals for both simulation-based and data-based calibrations inherit
properties of the sample used for training, which can result in biases for
downstream analyses. In the case of simulation-based calibration, we argue that
our recently proposed Gaussian Ansatz approach can avoid some of the pitfalls
of prior dependence, whereas prior-independent data-based calibration remains
an open problem.
Related papers
- What Really Matters for Learning-based LiDAR-Camera Calibration [50.2608502974106]
This paper revisits the development of learning-based LiDAR-Camera calibration.
We identify the critical limitations of regression-based methods with the widely used data generation pipeline.
We also investigate how the input data format and preprocessing operations impact network performance.
arXiv Detail & Related papers (2025-01-28T14:12:32Z) - Beware of Calibration Data for Pruning Large Language Models [41.1689082093302]
Post-training pruning is a promising method that does not require resource-intensive iterative training.
We show that the effects of calibration data even value more than designing advanced pruning strategies.
Our preliminary exploration also discloses that using calibration data similar to the training data can yield better performance.
arXiv Detail & Related papers (2024-10-23T09:36:21Z) - Reassessing How to Compare and Improve the Calibration of Machine Learning Models [7.183341902583164]
A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction.
We show that there exist trivial recalibration approaches that can appear seemingly state-of-the-art unless calibration and prediction metrics are accompanied by additional generalization metrics.
arXiv Detail & Related papers (2024-06-06T13:33:45Z) - Distribution-Free Model-Agnostic Regression Calibration via
Nonparametric Methods [9.662269016653296]
We consider an individual calibration objective for characterizing the quantiles of the prediction model.
Existing methods have been largely and lack of statistical guarantee in terms of individual calibration.
We propose simple nonparametric calibration methods that are agnostic of the underlying prediction model.
arXiv Detail & Related papers (2023-05-20T21:31:51Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - Sharp Calibrated Gaussian Processes [58.94710279601622]
State-of-the-art approaches for designing calibrated models rely on inflating the Gaussian process posterior variance.
We present a calibration approach that generates predictive quantiles using a computation inspired by the vanilla Gaussian process posterior variance.
Our approach is shown to yield a calibrated model under reasonable assumptions.
arXiv Detail & Related papers (2023-02-23T12:17:36Z) - Variable-Based Calibration for Machine Learning Classifiers [11.9995808096481]
We introduce the notion of variable-based calibration to characterize calibration properties of a model.
We find that models with near-perfect expected calibration error can exhibit significant miscalibration as a function of features of the data.
arXiv Detail & Related papers (2022-09-30T00:49:31Z) - Calibrate: Interactive Analysis of Probabilistic Model Output [5.444048397001003]
We present Calibrate, an interactive reliability diagram that is resistant to drawbacks in traditional approaches.
We demonstrate the utility of Calibrate through use cases on both real-world and synthetic data.
arXiv Detail & Related papers (2022-07-27T20:01:27Z) - Investigation of Different Calibration Methods for Deep Speaker
Embedding based Verification Systems [66.61691401921296]
This paper presents an investigation over several methods of score calibration for deep speaker embedding extractors.
An additional focus of this research is to estimate the impact of score normalization on the calibration performance of the system.
arXiv Detail & Related papers (2022-03-28T21:22:22Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.