Related papers: Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

URL: http://arxiv.org/abs/2402.08733v2
Date: Mon, 27 May 2024 19:40:04 GMT
Title: Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs
Authors: Daniel D. Johnson, Daniel Tarlow, David Duvenaud, Chris J. Maddison,
Abstract summary: We propose a strategy for teaching a model to both approximate $p(Y|X)$ and also estimate the remaining gaps between $widehatp_theta(Y|X)$ and $p(Y|X)$. We demonstrate that our approach accurately estimates how much models don't know across ambiguous image classification, (synthetic) language modeling, and partially-observable navigation tasks.
Score: 35.92045337126979
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Identifying how much a model ${\widehat{p}}_{\theta}(Y|X)$ knows about the stochastic real-world process $p(Y|X)$ it was trained on is important to ensure it avoids producing incorrect or "hallucinated" answers or taking unsafe actions. But this is difficult for generative models because probabilistic predictions do not distinguish between per-response noise (aleatoric uncertainty) and lack of knowledge about the process (epistemic uncertainty), and existing epistemic uncertainty quantification techniques tend to be overconfident when the model underfits. We propose a general strategy for teaching a model to both approximate $p(Y|X)$ and also estimate the remaining gaps between ${\widehat{p}}_{\theta}(Y|X)$ and $p(Y|X)$: train it to predict pairs of independent responses drawn from the true conditional distribution, allow it to "cheat" by observing one response while predicting the other, then measure how much it cheats. Remarkably, we prove that being good at cheating (i.e. cheating whenever it improves your prediction) is equivalent to being second-order calibrated, a principled extension of ordinary calibration that allows us to construct provably-correct frequentist confidence intervals for $p(Y|X)$ and detect incorrect responses with high probability. We demonstrate empirically that our approach accurately estimates how much models don't know across ambiguous image classification, (synthetic) language modeling, and partially-observable navigation tasks, outperforming existing techniques.

Related papers

Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification [50.717692060500696]
Next-token prediction with the logarithmic loss is a cornerstone of autoregressive sequence modeling. Next-token prediction can be made robust so as to achieve $C=tilde O(H)$, representing moderate error amplification. No computationally efficient algorithm can achieve sub-polynomial approximation factor $C=e(log H)1-Omega(1)$.
arXiv Detail & Related papers (2025-02-18T02:52:00Z)
Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance. We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z)
Relabeling Minimal Training Subset to Flip a Prediction [20.708004593740004]
We find that relabeling fewer than 2% of the training points can always flip a prediction. We show that $|mathcalS_t|$ is highly related to the noise ratio in the training set and $|mathcalS_t|$ is correlated with but complementary to predicted probabilities.
arXiv Detail & Related papers (2023-05-22T08:10:43Z)
Faster online calibration without randomization: interval forecasts and the power of two choices [43.17917448937131]
We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Inspired by the works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem.
arXiv Detail & Related papers (2022-04-27T17:33:23Z)
Thought Flow Nets: From Single Predictions to Trains of Model Thought [39.619001911390804]
When humans solve complex problems, they rarely come up with a decision right-away. Instead, they start with an intuitive decision reflecting upon it, spot mistakes, resolve contradictions and jump between different hypotheses.
arXiv Detail & Related papers (2021-07-26T13:56:37Z)
How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering [80.82194311274694]
We examine the question "how can we know when language models know, with confidence, the answer to a particular query?" We examine three strong generative models -- T5, BART, and GPT-2 -- and study whether their probabilities on QA tasks are well calibrated. We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness.
arXiv Detail & Related papers (2020-12-02T03:53:13Z)
Almost Tight L0-norm Certified Robustness of Top-k Predictions against Adversarial Perturbations [78.23408201652984]
Top-k predictions are used in many real-world applications such as machine learning as a service, recommender systems, and web searches. Our work is based on randomized smoothing, which builds a provably robust classifier via randomizing an input. For instance, our method can build a classifier that achieves a certified top-3 accuracy of 69.2% on ImageNet when an attacker can arbitrarily perturb 5 pixels of a testing image.
arXiv Detail & Related papers (2020-11-15T21:34:44Z)
A Note on High-Probability versus In-Expectation Guarantees of Generalization Bounds in Machine Learning [95.48744259567837]
Statistical machine learning theory often tries to give generalization guarantees of machine learning models. Statements made about the performance of machine learning models have to take the sampling process into account. We show how one may transform one statement to another.
arXiv Detail & Related papers (2020-10-06T09:41:35Z)
Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions [121.10450359856242]
We develop a frequentist procedure that utilizes influence functions of a model's loss functional to construct a jackknife (or leave-one-out) estimator of predictive confidence intervals. The DJ satisfies (1) and (2), is applicable to a wide range of deep learning models, is easy to implement, and can be applied in a post-hoc fashion without interfering with model training or compromising its accuracy.
arXiv Detail & Related papers (2020-06-29T13:36:52Z)
Estimation of Accurate and Calibrated Uncertainties in Deterministic models [0.8702432681310401]
We devise a method to transform a deterministic prediction into a probabilistic one. We show that for doing so, one has to compromise between the accuracy and the reliability (calibration) of such a model. We show several examples both with synthetic data, where the underlying hidden noise can accurately be recovered, and with large real-world datasets.
arXiv Detail & Related papers (2020-03-11T04:02:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.