Language Models with Conformal Factuality Guarantees
- URL: http://arxiv.org/abs/2402.10978v1
- Date: Thu, 15 Feb 2024 18:31:53 GMT
- Title: Language Models with Conformal Factuality Guarantees
- Authors: Christopher Mohri, Tatsunori Hashimoto
- Abstract summary: Conformal factuality is a framework that can ensure high probability correctness guarantees for language model (LM) outputs.
We show that conformal prediction in language models corresponds to a back-off algorithm that provides high probability correctness guarantees.
- Score: 44.767328168194815
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Guaranteeing the correctness and factuality of language model (LM) outputs is
a major open problem. In this work, we propose conformal factuality, a
framework that can ensure high probability correctness guarantees for LMs by
connecting language modeling and conformal prediction. We observe that the
correctness of an LM output is equivalent to an uncertainty quantification
problem, where the uncertainty sets are defined as the entailment set of an
LM's output. Using this connection, we show that conformal prediction in
language models corresponds to a back-off algorithm that provides high
probability correctness guarantees by progressively making LM outputs less
specific (and expanding the associated uncertainty sets). This approach applies
to any black-box LM and requires very few human-annotated samples. Evaluations
of our approach on closed book QA (FActScore, NaturalQuestions) and reasoning
tasks (MATH) show that our approach can provide 80-90% correctness guarantees
while retaining the majority of the LM's original output.
Related papers
- ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees [68.33498595506941]
We introduce a novel uncertainty measure based on self-consistency theory.
We then develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the CP algorithm.
Empirical evaluations indicate that our uncertainty measure outperforms prior state-of-the-art methods.
arXiv Detail & Related papers (2024-06-29T17:33:07Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Uncertainty in Language Models: Assessment through Rank-Calibration [65.10149293133846]
Language Models (LMs) have shown promising performance in natural language generation.
It is crucial to correctly quantify their uncertainty in responding to given inputs.
We develop a novel and practical framework, termed $Rank$-$Calibration$, to assess uncertainty and confidence measures for LMs.
arXiv Detail & Related papers (2024-04-04T02:31:05Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - Investigating Uncertainty Calibration of Aligned Language Models under
the Multiple-Choice Setting [31.386229001853817]
This work systematically evaluate the impact of the alignment process on logit-based uncertainty calibration of LMs under the multiple-choice setting.
We find two distinct uncertainties in LMs under the multiple-choice setting, which are responsible for the answer decision and the format preference of the LMs.
We propose an easy-to-implement and sample-efficient method to calibrate aligned LMs.
arXiv Detail & Related papers (2023-10-18T06:07:28Z) - Conformal Language Modeling [61.94417935386489]
We propose a novel approach to conformal prediction for generative language models (LMs)
Standard conformal prediction produces prediction sets with rigorous, statistical guarantees.
We demonstrate the promise of our approach on multiple tasks in open-domain question answering, text summarization, and radiology report generation.
arXiv Detail & Related papers (2023-06-16T21:55:08Z) - Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models [37.63939774027709]
Large language models (LLMs) specializing in natural language generation (NLG) have recently started exhibiting promising capabilities.
We propose and compare several confidence/uncertainty measures, applying them to *selective NLG* where unreliable results could either be ignored or yielded for further assessment.
Results reveal that a simple measure for the semantic dispersion can be a reliable predictor of the quality of LLM responses.
arXiv Detail & Related papers (2023-05-30T16:31:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.