Related papers: Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs

Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs

URL: http://arxiv.org/abs/2501.00555v1
Date: Tue, 31 Dec 2024 17:33:12 GMT
Title: Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs
Authors: Harit Vishwakarma, Alan Mishler, Thomas Cook, Niccolò Dalmasso, Natraj Raman, Sumitra Ganesh,
Abstract summary: Con conformal prediction (CP) is a model-agnostic framework for distribution-free uncertainty quantification.<n>We introduce CP-OPT, an optimization framework to learn scores that minimize set sizes while maintaining coverage.<n>We also propose emphconformal revision of questions (CROQ) to revise the problem by narrowing down the available choices to those in the prediction set.
Score: 7.843594672029363
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are empowering decision-making in several applications, including tool or API usage and answering multiple-choice questions (MCQs). However, they often make overconfident, incorrect predictions, which can be risky in high-stakes settings like healthcare and finance. To mitigate these risks, recent works have used conformal prediction (CP), a model-agnostic framework for distribution-free uncertainty quantification. CP transforms a \emph{score function} into prediction sets that contain the true answer with high probability. While CP provides this coverage guarantee for arbitrary scores, the score quality significantly impacts prediction set sizes. Prior works have relied on LLM logits or other heuristic scores, lacking quality guarantees. We address this limitation by introducing CP-OPT, an optimization framework to learn scores that minimize set sizes while maintaining coverage. Furthermore, inspired by the Monty Hall problem, we extend CP's utility beyond uncertainty quantification to improve accuracy. We propose \emph{conformal revision of questions} (CROQ) to revise the problem by narrowing down the available choices to those in the prediction set. The coverage guarantee of CP ensures that the correct choice is in the revised question prompt with high probability, while the smaller number of choices increases the LLM's chances of answering it correctly. Experiments on MMLU, ToolAlpaca, and TruthfulQA datasets with Gemma-2, Llama-3 and Phi-3 models show that CP-OPT significantly reduces set sizes while maintaining coverage, and CROQ improves accuracy over the standard inference, especially when paired with CP-OPT scores. Together, CP-OPT and CROQ offer a robust framework for improving both the safety and accuracy of LLM-driven decision-making.

Related papers

Conformal Information Pursuit for Interactively Guiding Large Language Models [64.39770942422288]
This paper explores sequential querying strategies that aim to minimize the expected number of queries.<n>One such strategy is Information Pursuit (IP), a greedy algorithm that at each iteration selects the query that maximizes information gain or equivalently minimizes uncertainty.<n>We propose Conformal Information Pursuit (C-IP), an alternative approach to sequential information gain based on conformal prediction sets.
arXiv Detail & Related papers (2025-07-04T03:55:39Z)
Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models [20.810300785340072]
Conformal Prediction with Query Oracle (CPQ) is a framework characterizing the optimal interplay between these objectives.<n>Our algorithm is built on two core principles: one governs the optimal query policy, and the other defines the optimal mapping from queried samples to prediction sets.
arXiv Detail & Related papers (2025-06-05T18:26:14Z)
Self-ensemble: Mitigating Confidence Distortion for Large Language Models [89.03110940871765]
Large Language Models exhibit a confidence distortion problem on multi-choice question-answering.<n>We propose Self-ensemble to solve this problem.<n> Experimental results on three LLMs and datasets demonstrate that Self-ensemble comprehensively addresses the confidence distortion problem.
arXiv Detail & Related papers (2025-06-02T17:59:29Z)
Online Conformal Probabilistic Numerics via Adaptive Edge-Cloud Offloading [52.499838151272016]
This work introduces a new method to calibrate the HPD sets produced by PLS with the aim of guaranteeing long-term coverage requirements. The proposed method, referred to as online conformal prediction-PLS (OCP-PLS), assumes sporadic feedback from cloud to edge. The validity of OCP-PLS is verified via experiments that bring insights into trade-offs between coverage, prediction set size, and cloud usage.
arXiv Detail & Related papers (2025-03-18T17:30:26Z)
Statistical Guarantees of Correctness Coverage for Medical Multiple-Choice Question Answering [0.0]
Large language models (LLMs) are increasingly deployed in real-world question-answering (QA) applications. LLMs have been proven to generate hallucinations and nonfactual information, undermining their trustworthiness in high-stakes medical tasks. In this work, we for the first time adapt the CP framework to medical multiple-choice question-answering (MCQA) tasks.
arXiv Detail & Related papers (2025-03-07T15:22:10Z)
Robust Conformal Prediction with a Single Binary Certificate [58.450154976190795]
Conformal prediction (CP) converts any model's output to prediction sets with a guarantee to cover the true label with (adjustable) high probability. We propose a robust conformal prediction that produces smaller sets even with significantly lower MC samples.
arXiv Detail & Related papers (2025-03-07T08:41:53Z)
Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models [3.958317527488534]
Large Language and Vision-Language Models (LLMs/VLMs) are increasingly used in safety-critical applications. Uncertainty quantification helps assess prediction confidence and enables abstention when uncertainty is high. We propose learnable abstention, integrating reinforcement learning (RL) with Conformal Prediction (CP) to optimize abstention thresholds.
arXiv Detail & Related papers (2025-02-08T21:30:41Z)
Online scalable Gaussian processes with conformal prediction for guaranteed coverage [32.21093722162573]
The consistency of the resulting uncertainty values hinges on the premise that the learning function conforms to the properties specified by the GP model. We propose to wed the GP with the prevailing conformal prediction (CP), a distribution-free post-processing framework that produces it prediction sets with a provably valid coverage.
arXiv Detail & Related papers (2024-10-07T19:22:15Z)
ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees [68.33498595506941]
We introduce a novel uncertainty measure based on self-consistency theory. We then develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the CP algorithm. Empirical evaluations indicate that our uncertainty measure outperforms prior state-of-the-art methods.
arXiv Detail & Related papers (2024-06-29T17:33:07Z)
Verifiably Robust Conformal Prediction [1.391198481393699]
This paper introduces VRCP (Verifiably Robust Conformal Prediction), a new framework that leverages neural network verification methods to recover coverage guarantees under adversarial attacks. Our method is the first to support perturbations bounded by arbitrary norms including $ell1$, $ell2$, and $ellinfty$, as well as regression tasks. In every case, VRCP achieves above nominal coverage and yields significantly more efficient and informative prediction regions than the SotA.
arXiv Detail & Related papers (2024-05-29T09:50:43Z)
Self-Evaluation Improves Selective Generation in Large Language Models [54.003992911447696]
We reformulate open-ended generation tasks into token-level prediction tasks. We instruct an LLM to self-evaluate its answers. We benchmark a range of scoring methods based on self-evaluation.
arXiv Detail & Related papers (2023-12-14T19:09:22Z)
Equal Opportunity of Coverage in Fair Regression [50.76908018786335]
We study fair machine learning (ML) under predictive uncertainty to enable reliable and trustworthy decision-making. We propose Equal Opportunity of Coverage (EOC) that aims to achieve two properties: (1) coverage rates for different groups with similar outcomes are close, and (2) the coverage rate for the entire population remains at a predetermined level.
arXiv Detail & Related papers (2023-11-03T21:19:59Z)
Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs [56.526095828316386]
We propose a novel framework for adaptation with self-evaluation to improve the selective prediction performance of large language models (LLMs) We evaluate our method on a variety of question-answering (QA) datasets and show that it outperforms state-of-the-art selective prediction methods.
arXiv Detail & Related papers (2023-10-18T03:34:59Z)
RR-CP: Reliable-Region-Based Conformal Prediction for Trustworthy Medical Image Classification [24.52922162675259]
Conformal prediction (CP) generates a set of predictions for a given test sample. The size of the set indicates how certain the predictions are. We propose a new method called Reliable-Region-Based Conformal Prediction (RR-CP)
arXiv Detail & Related papers (2023-09-09T11:14:04Z)
LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text. This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion. We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z)
Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis [120.9545643534454]
It is crucial for the pipeline to minimize the calibration error, especially in safety-critical applications. There are various considerations behind the pipeline: (1) the choice and (2) the size of PLM, (3) the choice of uncertainty quantifier, (4) the choice of fine-tuning loss, and many more. In response, we recommend the following: (1) use ELECTRA for PLM encoding, (2) use larger PLMs if possible, (3) use Temp Scaling as the uncertainty quantifier, and (4) use Focal Loss for fine-tuning.
arXiv Detail & Related papers (2022-10-10T14:16:01Z)
Few-Shot Calibration of Set Predictors via Meta-Learned Cross-Validation-Based Conformal Prediction [33.33774397643919]
This paper introduces a novel meta-learning solution that aims at reducing the set prediction size. It builds on cross-validation-based CP, rather than the less efficient validation-based CP. It preserves formal per-task calibration guarantees, rather than less stringent task-marginal guarantees.
arXiv Detail & Related papers (2022-10-06T17:21:03Z)
Efficient Conformal Prediction via Cascaded Inference with Expanded Admission [43.596058175459746]
We present a novel approach for conformal prediction (CP) We aim to identify a set of promising prediction candidates -- in place of a single prediction. This set is guaranteed to contain a correct answer with high probability.
arXiv Detail & Related papers (2020-07-06T23:13:07Z)
AutoCP: Automated Pipelines for Accurate Prediction Intervals [84.16181066107984]
This paper proposes an AutoML framework called Automatic Machine Learning for Conformal Prediction (AutoCP) Unlike the familiar AutoML frameworks that attempt to select the best prediction model, AutoCP constructs prediction intervals that achieve the user-specified target coverage rate. We tested AutoCP on a variety of datasets and found that it significantly outperforms benchmark algorithms.
arXiv Detail & Related papers (2020-06-24T23:13:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.