Llamas Know What GPTs Don't Show: Surrogate Models for Confidence
Estimation
- URL: http://arxiv.org/abs/2311.08877v1
- Date: Wed, 15 Nov 2023 11:27:44 GMT
- Title: Llamas Know What GPTs Don't Show: Surrogate Models for Confidence
Estimation
- Authors: Vaishnavi Shrivastava, Percy Liang, Ananya Kumar
- Abstract summary: Large language models (LLMs) should signal low confidence on examples where they are incorrect, instead of misleading the user.
As of November 2023, state-of-the-art LLMs do not provide access to these probabilities.
Our best method composing linguistic confidences and surrogate model probabilities gives state-of-the-art confidence estimates on all 12 datasets.
- Score: 70.27452774899189
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To maintain user trust, large language models (LLMs) should signal low
confidence on examples where they are incorrect, instead of misleading the
user. The standard approach of estimating confidence is to use the softmax
probabilities of these models, but as of November 2023, state-of-the-art LLMs
such as GPT-4 and Claude-v1.3 do not provide access to these probabilities. We
first study eliciting confidence linguistically -- asking an LLM for its
confidence in its answer -- which performs reasonably (80.5% AUC on GPT-4
averaged across 12 question-answering datasets -- 7% above a random baseline)
but leaves room for improvement. We then explore using a surrogate confidence
model -- using a model where we do have probabilities to evaluate the original
model's confidence in a given question. Surprisingly, even though these
probabilities come from a different and often weaker model, this method leads
to higher AUC than linguistic confidences on 9 out of 12 datasets. Our best
method composing linguistic confidences and surrogate model probabilities gives
state-of-the-art confidence estimates on all 12 datasets (84.6% average AUC on
GPT-4).
Related papers
- Confidence Aware Learning for Reliable Face Anti-spoofing [52.23271636362843]
We propose a Confidence Aware Face Anti-spoofing model, which is aware of its capability boundary.
We estimate its confidence during the prediction of each sample.
Experiments show that the proposed CA-FAS can effectively recognize samples with low prediction confidence.
arXiv Detail & Related papers (2024-11-02T14:29:02Z) - Large Language Models Must Be Taught to Know What They Don't Know [97.90008709512921]
We show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead.
We also investigate the mechanisms that enable reliable uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators.
arXiv Detail & Related papers (2024-06-12T16:41:31Z) - Large Language Model Confidence Estimation via Black-Box Access [30.490207799344333]
We propose a simple framework where, we engineer novel features and train a (interpretable) model to estimate the confidence.
We empirically demonstrate that our framework is effective in estimating confidence of Flan-ul2,-13b and Mistral-7b on four benchmark Q&A tasks.
Our interpretable approach provides insight into features that are predictive of confidence, leading to the interesting and useful discovery.
arXiv Detail & Related papers (2024-06-01T02:08:44Z) - Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models [14.5291643644017]
We introduce the concept of Confidence-Probability Alignment.
We probe the alignment between models' internal and expressed confidence.
Among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment.
arXiv Detail & Related papers (2024-05-25T15:42:04Z) - Calibrating the Confidence of Large Language Models by Eliciting Fidelity [52.47397325111864]
Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless.
Post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate.
We propose a plug-and-play method to estimate the confidence of language models.
arXiv Detail & Related papers (2024-04-03T11:36:12Z) - Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence
Scores from Language Models Fine-Tuned with Human Feedback [91.22679548111127]
A trustworthy real-world prediction system should produce well-calibrated confidence scores.
We show that verbalized confidences emitted as output tokens are typically better-calibrated than the model's conditional probabilities.
arXiv Detail & Related papers (2023-05-24T10:12:33Z) - A Confidence-based Partial Label Learning Model for Crowd-Annotated
Named Entity Recognition [74.79785063365289]
Existing models for named entity recognition (NER) are mainly based on large-scale labeled datasets.
We propose a Confidence-based Partial Label Learning (CPLL) method to integrate the prior confidence (given by annotators) and posterior confidences (learned by models) for crowd-annotated NER.
arXiv Detail & Related papers (2023-05-21T15:31:23Z) - Learning Confidence for Transformer-based Neural Machine Translation [38.679505127679846]
We propose an unsupervised confidence estimate learning jointly with the training of the neural machine translation (NMT) model.
We explain confidence as how many hints the NMT model needs to make a correct prediction, and more hints indicate low confidence.
We demonstrate that our learned confidence estimate achieves high accuracy on extensive sentence/word-level quality estimation tasks.
arXiv Detail & Related papers (2022-03-22T01:51:58Z) - MACEst: The reliable and trustworthy Model Agnostic Confidence Estimator [0.17188280334580192]
We argue that any confidence estimates based upon standard machine learning point prediction algorithms are fundamentally flawed.
We present MACEst, a Model Agnostic Confidence Estimator, which provides reliable and trustworthy confidence estimates.
arXiv Detail & Related papers (2021-09-02T14:34:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.