ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees
- URL: http://arxiv.org/abs/2407.00499v3
- Date: Mon, 18 Nov 2024 08:33:35 GMT
- Title: ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees
- Authors: Zhiyuan Wang, Jinhao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Xiaoshuang Shi, Kaidi Xu, Hengtao Shen, Xiaofeng Zhu,
- Abstract summary: We introduce a novel uncertainty measure based on self-consistency theory.
We then develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the CP algorithm.
Empirical evaluations indicate that our uncertainty measure outperforms prior state-of-the-art methods.
- Score: 68.33498595506941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the closed-source nature of the latest large language models (LLMs). This study investigates applying conformal prediction (CP), which can transform any heuristic uncertainty notion into rigorous prediction sets, to black-box LLMs in open-ended NLG tasks. We introduce a novel uncertainty measure based on self-consistency theory, and then develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the CP algorithm. Empirical evaluations indicate that our uncertainty measure outperforms prior state-of-the-art methods. Furthermore, we achieve strict control over the correctness coverage rate utilizing 7 popular LLMs on 4 free-form NLG datasets, spanning general-purpose and medical scenarios. Additionally, the calibrated prediction sets with small size further highlights the efficiency of our method in providing trustworthy guarantees for practical open-ended NLG applications.
Related papers
- Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction [0.0]
We propose a model-agnostic uncertainty quantification method that integrates dynamic threshold calibration and cross-modal consistency verification.
We show that the framework achieves stable performance across varying calibration-to-test split ratios, underscoring its robustness for real-world deployment in healthcare, autonomous systems, and other safety-sensitive domains.
This work bridges the gap between theoretical reliability and practical applicability in multi-modal AI systems, offering a scalable solution for hallucination detection and uncertainty-aware decision-making.
arXiv Detail & Related papers (2025-04-24T15:39:46Z) - SConU: Selective Conformal Uncertainty in Large Language Models [59.25881667640868]
We propose a novel approach termed Selective Conformal Uncertainty (SConU)
We develop two conformal p-values that are instrumental in determining whether a given sample deviates from the uncertainty distribution of the calibration set at a specific manageable risk level.
Our approach not only facilitates rigorous management of miscoverage rates across both single-domain and interdisciplinary contexts, but also enhances the efficiency of predictions.
arXiv Detail & Related papers (2025-04-19T03:01:45Z) - Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey [11.737403011836532]
Large Language Models (LLMs) excel in text generation, reasoning, and decision-making in high-stakes domains such as healthcare, law, and transportation.
Uncertainty quantification (UQ) enhances trustworthiness by estimating confidence in outputs, enabling risk mitigation and selective prediction.
We introduce a new taxonomy that categorizes UQ methods based on computational efficiency and uncertainty dimensions.
arXiv Detail & Related papers (2025-03-20T05:04:29Z) - COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation [14.461333001997449]
Uncertainty Quantification (UQ) for Natural Language Generation (NLG) is crucial for assessing the performance of Large Language Models (LLMs)
We propose ourmethod, a method that explicitly adds the ground truth to the candidate outputs and uses logit scores to measure nonconformity.
arXiv Detail & Related papers (2025-02-18T07:25:12Z) - Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation [0.0]
We explore uncertainty estimation as a proxy for correctness in LLM-generated code.
We adapt two state-of-the-art techniques from natural language generation to the domain of code generation.
Our findings indicate a strong correlation between the uncertainty computed through these techniques and correctness.
arXiv Detail & Related papers (2025-02-17T10:03:01Z) - Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI [47.64301863399763]
We present a dynamic semantic clustering approach inspired by the Chinese Restaurant Process.
We quantify uncertainty of Large Language Models (LLMs) on a given query by calculating entropy of the generated semantic clusters.
We propose leveraging the (negative) likelihood of these clusters as the (non)conformity score within Conformal Prediction framework.
arXiv Detail & Related papers (2024-11-04T18:49:46Z) - Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
Uncertainty in Large Language Models (LLMs) is crucial for applications where safety and reliability are important.
We propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs.
arXiv Detail & Related papers (2024-05-30T12:42:05Z) - Conformal Prediction for Natural Language Processing: A Survey [23.638214012459425]
Conformal prediction is emerging as a theoretically sound and practically useful framework.
Its model-agnostic and distribution-free nature makes it particularly promising to address the current shortcomings of NLP systems.
This paper provides a comprehensive survey of conformal prediction techniques, their guarantees, and existing applications in NLP.
arXiv Detail & Related papers (2024-05-03T10:00:45Z) - Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond [52.246494389096654]
This paper introduces Word-Sequence Entropy (WSE), a method that calibrates uncertainty at both the word and sequence levels.
We compare WSE with six baseline methods on five free-form medical QA datasets, utilizing seven popular large language models (LLMs)
arXiv Detail & Related papers (2024-02-22T03:46:08Z) - Language Models with Conformal Factuality Guarantees [44.767328168194815]
Conformal factuality is a framework that can ensure high probability correctness guarantees for language model (LM) outputs.
We show that conformal prediction in language models corresponds to a back-off algorithm that provides high probability correctness guarantees.
arXiv Detail & Related papers (2024-02-15T18:31:53Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models [37.63939774027709]
Large language models (LLMs) specializing in natural language generation (NLG) have recently started exhibiting promising capabilities.
We propose and compare several confidence/uncertainty measures, applying them to *selective NLG* where unreliable results could either be ignored or yielded for further assessment.
Results reveal that a simple measure for the semantic dispersion can be a reliable predictor of the quality of LLM responses.
arXiv Detail & Related papers (2023-05-30T16:31:26Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.