Related papers: Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models

Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models

URL: http://arxiv.org/abs/2506.05497v1
Date: Thu, 05 Jun 2025 18:26:14 GMT
Title: Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models
Authors: Sima Noorani, Shayan Kiyani, George Pappas, Hamed Hassani,
Abstract summary: Conformal Prediction with Query Oracle (CPQ) is a framework characterizing the optimal interplay between these objectives.<n>Our algorithm is built on two core principles: one governs the optimal query policy, and the other defines the optimal mapping from queried samples to prediction sets.
Score: 20.810300785340072
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Uncertainty quantification (UQ) is essential for safe deployment of generative AI models such as large language models (LLMs), especially in high stakes applications. Conformal prediction (CP) offers a principled uncertainty quantification framework, but classical methods focus on regression and classification, relying on geometric distances or softmax scores: tools that presuppose structured outputs. We depart from this paradigm by studying CP in a query only setting, where prediction sets must be constructed solely from finite queries to a black box generative model, introducing a new trade off between coverage, test time query budget, and informativeness. We introduce Conformal Prediction with Query Oracle (CPQ), a framework characterizing the optimal interplay between these objectives. Our finite sample algorithm is built on two core principles: one governs the optimal query policy, and the other defines the optimal mapping from queried samples to prediction sets. Remarkably, both are rooted in the classical missing mass problem in statistics. Specifically, the optimal query policy depends on the rate of decay, or the derivative, of the missing mass, for which we develop a novel estimator. Meanwhile, the optimal mapping hinges on the missing mass itself, which we estimate using Good Turing estimators. We then turn our focus to implementing our method for language models, where outputs are vast, variable, and often under specified. Fine grained experiments on three real world open ended tasks and two LLMs, show CPQ applicability to any black box LLM and highlight: (1) individual contribution of each principle to CPQ performance, and (2) CPQ ability to yield significantly more informative prediction sets than existing conformal methods for language uncertainty quantification.

Related papers

Conformal Information Pursuit for Interactively Guiding Large Language Models [64.39770942422288]
This paper explores sequential querying strategies that aim to minimize the expected number of queries.<n>One such strategy is Information Pursuit (IP), a greedy algorithm that at each iteration selects the query that maximizes information gain or equivalently minimizes uncertainty.<n>We propose Conformal Information Pursuit (C-IP), an alternative approach to sequential information gain based on conformal prediction sets.
arXiv Detail & Related papers (2025-07-04T03:55:39Z)
A Principled Approach to Randomized Selection under Uncertainty: Applications to Peer Review and Grant Funding [68.43987626137512]
We propose a principled framework for randomized decision-making based on interval estimates of the quality of each item.<n>We introduce MERIT, an optimization-based method that maximizes the worst-case expected number of top candidates selected.<n>We prove that MERIT satisfies desirable axiomatic properties not guaranteed by existing approaches.
arXiv Detail & Related papers (2025-06-23T19:59:30Z)
Random-Set Large Language Models [4.308457163593758]
Large Language Models (LLMs) are known to produce very high-quality tests and responses to our queries.<n>But how much can we trust this generated text?<n>We propose a novel Random-Set Large Language Model (RSLLM) approach which predicts finite random sets (belief functions) over the token space.
arXiv Detail & Related papers (2025-04-25T05:25:27Z)
Optimal Transport-based Conformal Prediction [8.302146576157497]
Conformal Prediction (CP) is a principled framework for uncertainty in blackbox learning models.<n>We introduce a novel CP procedure handling prediction score functions through a lens.<n>We then adapt our method for quantifying multi-output regression and multiclass classification.
arXiv Detail & Related papers (2025-01-31T09:48:28Z)
Query Performance Prediction using Relevance Judgments Generated by Large Language Models [53.97064615557883]
We propose a new Query performance prediction (QPP) framework using automatically generated relevance judgments (QPP-GenRE)<n>QPP-GenRE decomposes QPP into independent subtasks of predicting relevance of each item in a ranked list to a given query.<n>We predict an item's relevance by using open-source large language models (LLMs) to ensure scientific relevance.
arXiv Detail & Related papers (2024-04-01T09:33:05Z)
Learning-Based Approaches to Predictive Monitoring with Conformal Statistical Guarantees [2.1684857243537334]
This tutorial focuses on efficient methods to predictive monitoring (PM) PM is the problem of detecting future violations of a given requirement from the current state of a system. We present a general and comprehensive framework summarizing our approach to the predictive monitoring of CPSs.
arXiv Detail & Related papers (2023-12-04T15:16:42Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis [120.9545643534454]
It is crucial for the pipeline to minimize the calibration error, especially in safety-critical applications. There are various considerations behind the pipeline: (1) the choice and (2) the size of PLM, (3) the choice of uncertainty quantifier, (4) the choice of fine-tuning loss, and many more. In response, we recommend the following: (1) use ELECTRA for PLM encoding, (2) use larger PLMs if possible, (3) use Temp Scaling as the uncertainty quantifier, and (4) use Focal Loss for fine-tuning.
arXiv Detail & Related papers (2022-10-10T14:16:01Z)
Efficient Conformal Prediction via Cascaded Inference with Expanded Admission [43.596058175459746]
We present a novel approach for conformal prediction (CP) We aim to identify a set of promising prediction candidates -- in place of a single prediction. This set is guaranteed to contain a correct answer with high probability.
arXiv Detail & Related papers (2020-07-06T23:13:07Z)
AutoCP: Automated Pipelines for Accurate Prediction Intervals [84.16181066107984]
This paper proposes an AutoML framework called Automatic Machine Learning for Conformal Prediction (AutoCP) Unlike the familiar AutoML frameworks that attempt to select the best prediction model, AutoCP constructs prediction intervals that achieve the user-specified target coverage rate. We tested AutoCP on a variety of datasets and found that it significantly outperforms benchmark algorithms.
arXiv Detail & Related papers (2020-06-24T23:13:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.