Did You Mean...? Confidence-based Trade-offs in Semantic Parsing
- URL: http://arxiv.org/abs/2303.16857v3
- Date: Fri, 20 Oct 2023 12:54:30 GMT
- Title: Did You Mean...? Confidence-based Trade-offs in Semantic Parsing
- Authors: Elias Stengel-Eskin and Benjamin Van Durme
- Abstract summary: We show how a calibrated model can help balance common trade-offs in task-oriented parsing.
We then examine how confidence scores can help optimize the trade-off between usability and safety.
- Score: 52.28988386710333
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We illustrate how a calibrated model can help balance common trade-offs in
task-oriented parsing. In a simulated annotator-in-the-loop experiment, we show
that well-calibrated confidence scores allow us to balance cost with annotator
load, improving accuracy with a small number of interactions. We then examine
how confidence scores can help optimize the trade-off between usability and
safety. We show that confidence-based thresholding can substantially reduce the
number of incorrect low-confidence programs executed; however, this comes at a
cost to usability. We propose the DidYouMean system which better balances
usability and safety.
Related papers
- Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators [6.403926452181712]
Large Language Models (LLMs) tend to be unreliable in the factuality of their answers.
We present a survey and empirical comparison of estimators of factual confidence.
Our experiments indicate that trained hidden-state probes provide the most reliable confidence estimates.
arXiv Detail & Related papers (2024-06-19T10:11:37Z) - Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models [14.5291643644017]
We introduce the concept of Confidence-Probability Alignment.
We probe the alignment between models' internal and expressed confidence.
Among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment.
arXiv Detail & Related papers (2024-05-25T15:42:04Z) - When to Trust LLMs: Aligning Confidence with Response Quality [49.371218210305656]
We propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD)
It integrates quality reward and order-preserving alignment reward functions.
Experiments demonstrate that CONQORD significantly improves the alignment performance between confidence and response accuracy.
arXiv Detail & Related papers (2024-04-26T09:42:46Z) - U-Trustworthy Models.Reliability, Competence, and Confidence in
Decision-Making [0.21756081703275998]
We present a precise mathematical definition of trustworthiness, termed $mathcalU$-trustworthiness.
Within the context of $mathcalU$-trustworthiness, we prove that properly-ranked models are inherently $mathcalU$-trustworthy.
We advocate for the adoption of the AUC metric as the preferred measure of trustworthiness.
arXiv Detail & Related papers (2024-01-04T04:58:02Z) - A Diachronic Perspective on User Trust in AI under Uncertainty [52.44939679369428]
Modern NLP systems are often uncalibrated, resulting in confidently incorrect predictions that undermine user trust.
We study the evolution of user trust in response to trust-eroding events using a betting game.
arXiv Detail & Related papers (2023-10-20T14:41:46Z) - Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification.
We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate.
We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z) - Trust, but Verify: Using Self-Supervised Probing to Improve
Trustworthiness [29.320691367586004]
We introduce a new approach of self-supervised probing, which enables us to check and mitigate the overconfidence issue for a trained model.
We provide a simple yet effective framework, which can be flexibly applied to existing trustworthiness-related methods in a plug-and-play manner.
arXiv Detail & Related papers (2023-02-06T08:57:20Z) - Confidence-Calibrated Face and Kinship Verification [8.570969129199467]
We introduce an effective confidence measure that allows verification models to convert a similarity score into a confidence score for any given face pair.
We also propose a confidence-calibrated approach, termed Angular Scaling (ASC), which is easy to implement and can be readily applied to existing verification models.
To the best of our knowledge, our work presents the first comprehensive confidence-calibrated solution for modern face and kinship verification tasks.
arXiv Detail & Related papers (2022-10-25T10:43:46Z) - Efficient Empowerment Estimation for Unsupervised Stabilization [75.32013242448151]
empowerment principle enables unsupervised stabilization of dynamical systems at upright positions.
We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel.
We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
arXiv Detail & Related papers (2020-07-14T21:10:16Z) - Binary Classification from Positive Data with Skewed Confidence [85.18941440826309]
Positive-confidence (Pconf) classification is a promising weakly-supervised learning method.
In practice, the confidence may be skewed by bias arising in an annotation process.
We introduce the parameterized model of the skewed confidence, and propose the method for selecting the hyper parameter.
arXiv Detail & Related papers (2020-01-29T00:04:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.