Investigating Uncertainty Calibration of Aligned Language Models under
the Multiple-Choice Setting
- URL: http://arxiv.org/abs/2310.11732v2
- Date: Sun, 19 Nov 2023 12:40:41 GMT
- Title: Investigating Uncertainty Calibration of Aligned Language Models under
the Multiple-Choice Setting
- Authors: Guande He, Peng Cui, Jianfei Chen, Wenbo Hu, Jun Zhu
- Abstract summary: This work systematically evaluate the impact of the alignment process on logit-based uncertainty calibration of LMs under the multiple-choice setting.
We find two distinct uncertainties in LMs under the multiple-choice setting, which are responsible for the answer decision and the format preference of the LMs.
We propose an easy-to-implement and sample-efficient method to calibrate aligned LMs.
- Score: 31.386229001853817
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the significant progress made in practical applications of aligned
language models (LMs), they tend to be overconfident in output answers compared
to the corresponding pre-trained LMs. In this work, we systematically evaluate
the impact of the alignment process on logit-based uncertainty calibration of
LMs under the multiple-choice setting. We first conduct a thoughtful empirical
study on how aligned LMs differ in calibration from their pre-trained
counterparts. Experimental results reveal that there are two distinct
uncertainties in LMs under the multiple-choice setting, which are responsible
for the answer decision and the format preference of the LMs, respectively.
Then, we investigate the role of these two uncertainties on aligned LM's
calibration through fine-tuning in simple synthetic alignment schemes and
conclude that one reason for aligned LMs' overconfidence is the conflation of
these two types of uncertainty. Furthermore, we examine the utility of common
post-hoc calibration methods for aligned LMs and propose an easy-to-implement
and sample-efficient method to calibrate aligned LMs. We hope our findings
could provide insights into the design of more reliable alignment processes for
LMs.
Related papers
- Does Alignment Tuning Really Break LLMs' Internal Confidence? [5.893124686141782]
Large Language Models (LLMs) have shown remarkable progress, but their real-world application necessitates reliable calibration.
This study conducts a comprehensive analysis of calibration degradation of LLMs across four dimensions: models, calibration metrics, tasks, and confidence extraction methods.
arXiv Detail & Related papers (2024-08-31T05:12:36Z) - Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding [7.855485779946983]
Calibrating language models (LMs) align their generation confidence with the actual likelihood of answer correctness.
We propose an activation-based calibration method, ActCab, which trains a linear layer on top of the LM's last-layer activations.
We also propose CoDec, a confidence-guided decoding strategy, to elicit truthful answers with high confidence from LMs.
arXiv Detail & Related papers (2024-06-19T05:33:34Z) - Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models [84.94220787791389]
We propose Fact-and-Reflection (FaR) prompting, which improves the LLM calibration in two steps.
Experiments show that FaR achieves significantly better calibration; it lowers the Expected Error by 23.5%.
FaR even elicits the capability of verbally expressing concerns in less confident scenarios.
arXiv Detail & Related papers (2024-02-27T01:37:23Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - Language Models with Conformal Factuality Guarantees [44.767328168194815]
Conformal factuality is a framework that can ensure high probability correctness guarantees for language model (LM) outputs.
We show that conformal prediction in language models corresponds to a back-off algorithm that provides high probability correctness guarantees.
arXiv Detail & Related papers (2024-02-15T18:31:53Z) - RoAST: Robustifying Language Models via Adversarial Perturbation with
Selective Training [105.02614392553198]
We propose Robustifying LMs via Adversarial perturbation with Selective Training (RoAST)
RoAST incorporates two important sources for the model robustness, robustness on the perturbed inputs and generalizable knowledge in pre-trained LMs.
We demonstrate the effectiveness of RoAST compared to state-of-the-art fine-tuning methods on six different types of LMs.
arXiv Detail & Related papers (2023-12-07T04:23:36Z) - The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context
Learning [61.68787689234622]
A recent study, LIMA, shows that using merely 1K examples for alignment tuning can achieve significant alignment performance as well.
This raises questions about how exactly the alignment tuning transforms a base LLM.
We show that the gap between tuning-free and tuning-based alignment methods can be significantly reduced through strategic prompting.
arXiv Detail & Related papers (2023-12-04T00:46:11Z) - LitCab: Lightweight Language Model Calibration over Short- and Long-form
Responses [14.77013588561901]
We present LitCab, a lightweight calibration mechanism consisting of a single linear layer that takes the input text representation and predicts a bias term.
For evaluation, we construct CaT, a benchmark consisting of eight text generation tasks, covering responses ranging from short phrases to paragraphs.
We test LitCab with Llama2-7B, where it improves calibration across all tasks, reducing the average ECE score by as large as 30%.
arXiv Detail & Related papers (2023-10-30T00:30:34Z) - LM vs LM: Detecting Factual Errors via Cross Examination [22.50837561382647]
We propose a factuality evaluation framework for language models (LMs)
Our key idea is that an incorrect claim is likely to result in inconsistency with other claims that the model generates.
We empirically evaluate our method on factual claims made by multiple recent LMs on four benchmarks.
arXiv Detail & Related papers (2023-05-22T17:42:14Z) - A Close Look into the Calibration of Pre-trained Language Models [56.998539510508515]
Pre-trained language models (PLMs) may fail in giving reliable estimates of their predictive uncertainty.
We study the dynamic change in PLMs' calibration performance in training.
We extend two recently proposed learnable methods that directly collect data to train models to have reasonable confidence estimations.
arXiv Detail & Related papers (2022-10-31T21:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.