Related papers: COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation

COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation

URL: http://arxiv.org/abs/2502.12601v1
Date: Tue, 18 Feb 2025 07:25:12 GMT
Title: COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation
Authors: Sean Wang, Yicheng Jiang, Yuxin Tang, Lu Cheng, Hanjie Chen,
Abstract summary: Uncertainty Quantification (UQ) for Natural Language Generation (NLG) is crucial for assessing the performance of Large Language Models (LLMs)<n>We propose ourmethod, a method that explicitly adds the ground truth to the candidate outputs and uses logit scores to measure nonconformity.
Score: 14.461333001997449
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Uncertainty Quantification (UQ) for Natural Language Generation (NLG) is crucial for assessing the performance of Large Language Models (LLMs), as it reveals confidence in predictions, identifies failure modes, and gauges output reliability. Conformal Prediction (CP), a model-agnostic method that generates prediction sets with a specified error rate, has been adopted for UQ in classification tasks, where the size of the prediction set indicates the model's uncertainty. However, when adapting CP to NLG, the sampling-based method for generating candidate outputs cannot guarantee the inclusion of the ground truth, limiting its applicability across a wide range of error rates. To address this, we propose \ourmethod, a method that explicitly adds the ground truth to the candidate outputs and uses logit scores to measure nonconformity. Our experiments with six LLMs on four NLG tasks show that \ourmethod outperforms baseline methods in calibrating error rates and empirical cover rates, offering accurate UQ across a wide range of user-specified error rates.

Related papers

Conformal Linguistic Calibration: Trading-off between Factuality and Specificity [41.45862052156885]
We propose a unified framework that connects abstention and linguistic calibration through the lens of linguistic pragmatics. We describe an implementation that allows for controlling the level of imprecision in model responses. Our approach enables fine-tuning models to perform uncertainty-aware adaptive claim rewriting, offering a controllable balance between factuality and specificity.
arXiv Detail & Related papers (2025-02-26T13:01:49Z)
Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation [0.0]
We explore uncertainty estimation as a proxy for correctness in LLM-generated code.<n>We adapt two state-of-the-art techniques from natural language generation.<n>We develop an abstention policy that prevents the model from making predictions when uncertainty is high.
arXiv Detail & Related papers (2025-02-17T10:03:01Z)
Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI [47.64301863399763]
We present a dynamic semantic clustering approach inspired by the Chinese Restaurant Process. We quantify uncertainty of Large Language Models (LLMs) on a given query by calculating entropy of the generated semantic clusters. We propose leveraging the (negative) likelihood of these clusters as the (non)conformity score within Conformal Prediction framework.
arXiv Detail & Related papers (2024-11-04T18:49:46Z)
Generative Conformal Prediction with Vectorized Non-Conformity Scores [6.059745771017814]
Conformal prediction provides model-agnostic uncertainty quantification with guaranteed coverage. We propose a generative conformal prediction framework with vectorized non-conformity scores. We construct adaptive uncertainty sets using density-ranked uncertainty balls.
arXiv Detail & Related papers (2024-10-17T16:37:03Z)
ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees [68.33498595506941]
We introduce a novel uncertainty measure based on self-consistency theory. We then develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the CP algorithm. Empirical evaluations indicate that our uncertainty measure outperforms prior state-of-the-art methods.
arXiv Detail & Related papers (2024-06-29T17:33:07Z)
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification [116.77055746066375]
Large language models (LLMs) are notorious for hallucinating, i.e., producing erroneous claims in their output. We propose a novel fact-checking and hallucination detection pipeline based on token-level uncertainty quantification.
arXiv Detail & Related papers (2024-03-07T17:44:17Z)
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability. In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling. Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z)
Selective Generation for Controllable Language Models [19.909671258499184]
Trustworthiness of generative language models (GLMs) is crucial in their deployment to critical decision making systems.<n>We propose two selective generation algorithms which control the false discovery rate with respect to the textual entailment relation (FDR-E)<n>$textttSGentextttSup$, a direct modification of the selective prediction, exploits entailment-labeled data, annotated by humans.<n>Since human annotation is costly, we propose a semi-supervised version, $textttSGentextttSemi$, which fully utilizes the un
arXiv Detail & Related papers (2023-07-18T13:36:24Z)
Conformal Language Modeling [61.94417935386489]
We propose a novel approach to conformal prediction for generative language models (LMs) Standard conformal prediction produces prediction sets with rigorous, statistical guarantees. We demonstrate the promise of our approach on multiple tasks in open-domain question answering, text summarization, and radiology report generation.
arXiv Detail & Related papers (2023-06-16T21:55:08Z)
Error-based Knockoffs Inference for Controlled Feature Selection [49.99321384855201]
We propose an error-based knockoff inference method by integrating the knockoff features, the error-based feature importance statistics, and the stepdown procedure together. The proposed inference procedure does not require specifying a regression model and can handle feature selection with theoretical guarantees.
arXiv Detail & Related papers (2022-03-09T01:55:59Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.