When Does Confidence-Based Cascade Deferral Suffice?
- URL: http://arxiv.org/abs/2307.02764v2
- Date: Tue, 23 Jan 2024 16:01:02 GMT
- Title: When Does Confidence-Based Cascade Deferral Suffice?
- Authors: Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna
Narasimhan, Ankit Singh Rawat, Sanjiv Kumar
- Abstract summary: Cascades are a classical strategy to enable inference cost to vary adaptively across samples.
A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction.
Despite being oblivious to the structure of the cascade, confidence-based deferral often works remarkably well in practice.
- Score: 69.28314307469381
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cascades are a classical strategy to enable inference cost to vary adaptively
across samples, wherein a sequence of classifiers are invoked in turn. A
deferral rule determines whether to invoke the next classifier in the sequence,
or to terminate prediction. One simple deferral rule employs the confidence of
the current classifier, e.g., based on the maximum predicted softmax
probability. Despite being oblivious to the structure of the cascade -- e.g.,
not modelling the errors of downstream models -- such confidence-based deferral
often works remarkably well in practice. In this paper, we seek to better
understand the conditions under which confidence-based deferral may fail, and
when alternate deferral strategies can perform better. We first present a
theoretical characterisation of the optimal deferral rule, which precisely
characterises settings under which confidence-based deferral may suffer. We
then study post-hoc deferral mechanisms, and demonstrate they can significantly
improve upon confidence-based deferral in settings where (i) downstream models
are specialists that only work well on a subset of inputs, (ii) samples are
subject to label noise, and (iii) there is distribution shift between the train
and test set.
Related papers
- Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties.
We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model.
We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z) - Approximate Conditional Coverage via Neural Model Approximations [0.030458514384586396]
We analyze a data-driven procedure for obtaining empirically reliable approximate conditional coverage.
We demonstrate the potential for substantial (and otherwise unknowable) under-coverage with split-conformal alternatives with marginal coverage guarantees.
arXiv Detail & Related papers (2022-05-28T02:59:05Z) - Multi-label Chaining with Imprecise Probabilities [0.0]
We present two different strategies to extend the classical multi-label chaining approach to handle imprecise probability estimates.
The main reasons one could have for using such estimations are (1) to make cautious predictions when a high uncertainty is detected in the chaining and (2) to make better precise predictions by avoiding biases caused in early decisions in the chaining.
Our experimental results on missing labels, which investigate how reliable these predictions are in both approaches, indicate that our approaches produce relevant cautiousness on those hard-to-predict instances where the precise models fail.
arXiv Detail & Related papers (2021-07-15T16:43:31Z) - Don't Just Blame Over-parametrization for Over-confidence: Theoretical
Analysis of Calibration in Binary Classification [58.03725169462616]
We show theoretically that over-parametrization is not the only reason for over-confidence.
We prove that logistic regression is inherently over-confident, in the realizable, under-parametrized setting.
Perhaps surprisingly, we also show that over-confidence is not always the case.
arXiv Detail & Related papers (2021-02-15T21:38:09Z) - Estimation and Applications of Quantiles in Deep Binary Classification [0.0]
Quantile regression, based on check loss, is a widely used inferential paradigm in Statistics.
We consider the analogue of check loss in the binary classification setting.
We develop individualized confidence scores that can be used to decide whether a prediction is reliable.
arXiv Detail & Related papers (2021-02-09T07:07:42Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.