Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs
- URL: http://arxiv.org/abs/2501.00555v1
- Date: Tue, 31 Dec 2024 17:33:12 GMT
- Title: Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs
- Authors: Harit Vishwakarma, Alan Mishler, Thomas Cook, Niccolò Dalmasso, Natraj Raman, Sumitra Ganesh,
- Abstract summary: Con conformal prediction (CP) is a model-agnostic framework for distribution-free uncertainty quantification.<n>We introduce CP-OPT, an optimization framework to learn scores that minimize set sizes while maintaining coverage.<n>We also propose emphconformal revision of questions (CROQ) to revise the problem by narrowing down the available choices to those in the prediction set.
- Score: 7.843594672029363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) are empowering decision-making in several applications, including tool or API usage and answering multiple-choice questions (MCQs). However, they often make overconfident, incorrect predictions, which can be risky in high-stakes settings like healthcare and finance. To mitigate these risks, recent works have used conformal prediction (CP), a model-agnostic framework for distribution-free uncertainty quantification. CP transforms a \emph{score function} into prediction sets that contain the true answer with high probability. While CP provides this coverage guarantee for arbitrary scores, the score quality significantly impacts prediction set sizes. Prior works have relied on LLM logits or other heuristic scores, lacking quality guarantees. We address this limitation by introducing CP-OPT, an optimization framework to learn scores that minimize set sizes while maintaining coverage. Furthermore, inspired by the Monty Hall problem, we extend CP's utility beyond uncertainty quantification to improve accuracy. We propose \emph{conformal revision of questions} (CROQ) to revise the problem by narrowing down the available choices to those in the prediction set. The coverage guarantee of CP ensures that the correct choice is in the revised question prompt with high probability, while the smaller number of choices increases the LLM's chances of answering it correctly. Experiments on MMLU, ToolAlpaca, and TruthfulQA datasets with Gemma-2, Llama-3 and Phi-3 models show that CP-OPT significantly reduces set sizes while maintaining coverage, and CROQ improves accuracy over the standard inference, especially when paired with CP-OPT scores. Together, CP-OPT and CROQ offer a robust framework for improving both the safety and accuracy of LLM-driven decision-making.
Related papers
- Online Conformal Probabilistic Numerics via Adaptive Edge-Cloud Offloading [52.499838151272016]
This work introduces a new method to calibrate the HPD sets produced by PLS with the aim of guaranteeing long-term coverage requirements.
The proposed method, referred to as online conformal prediction-PLS (OCP-PLS), assumes sporadic feedback from cloud to edge.
The validity of OCP-PLS is verified via experiments that bring insights into trade-offs between coverage, prediction set size, and cloud usage.
arXiv Detail & Related papers (2025-03-18T17:30:26Z) - Statistical Guarantees of Correctness Coverage for Medical Multiple-Choice Question Answering [0.0]
Large language models (LLMs) are increasingly deployed in real-world question-answering (QA) applications.
LLMs have been proven to generate hallucinations and nonfactual information, undermining their trustworthiness in high-stakes medical tasks.
In this work, we for the first time adapt the CP framework to medical multiple-choice question-answering (MCQA) tasks.
arXiv Detail & Related papers (2025-03-07T15:22:10Z) - Robust Conformal Prediction with a Single Binary Certificate [58.450154976190795]
Conformal prediction (CP) converts any model's output to prediction sets with a guarantee to cover the true label with (adjustable) high probability.
We propose a robust conformal prediction that produces smaller sets even with significantly lower MC samples.
arXiv Detail & Related papers (2025-03-07T08:41:53Z) - Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models [3.958317527488534]
Large Language and Vision-Language Models (LLMs/VLMs) are increasingly used in safety-critical applications.
Uncertainty quantification helps assess prediction confidence and enables abstention when uncertainty is high.
We propose learnable abstention, integrating reinforcement learning (RL) with Conformal Prediction (CP) to optimize abstention thresholds.
arXiv Detail & Related papers (2025-02-08T21:30:41Z) - Online scalable Gaussian processes with conformal prediction for guaranteed coverage [32.21093722162573]
The consistency of the resulting uncertainty values hinges on the premise that the learning function conforms to the properties specified by the GP model.
We propose to wed the GP with the prevailing conformal prediction (CP), a distribution-free post-processing framework that produces it prediction sets with a provably valid coverage.
arXiv Detail & Related papers (2024-10-07T19:22:15Z) - ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees [68.33498595506941]
We introduce a novel uncertainty measure based on self-consistency theory.
We then develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the CP algorithm.
Empirical evaluations indicate that our uncertainty measure outperforms prior state-of-the-art methods.
arXiv Detail & Related papers (2024-06-29T17:33:07Z) - Verifiably Robust Conformal Prediction [1.391198481393699]
This paper introduces VRCP (Verifiably Robust Conformal Prediction), a new framework that leverages neural network verification methods to recover coverage guarantees under adversarial attacks.
Our method is the first to support perturbations bounded by arbitrary norms including $ell1$, $ell2$, and $ellinfty$, as well as regression tasks.
In every case, VRCP achieves above nominal coverage and yields significantly more efficient and informative prediction regions than the SotA.
arXiv Detail & Related papers (2024-05-29T09:50:43Z) - Self-Evaluation Improves Selective Generation in Large Language Models [54.003992911447696]
We reformulate open-ended generation tasks into token-level prediction tasks.
We instruct an LLM to self-evaluate its answers.
We benchmark a range of scoring methods based on self-evaluation.
arXiv Detail & Related papers (2023-12-14T19:09:22Z) - Equal Opportunity of Coverage in Fair Regression [50.76908018786335]
We study fair machine learning (ML) under predictive uncertainty to enable reliable and trustworthy decision-making.
We propose Equal Opportunity of Coverage (EOC) that aims to achieve two properties: (1) coverage rates for different groups with similar outcomes are close, and (2) the coverage rate for the entire population remains at a predetermined level.
arXiv Detail & Related papers (2023-11-03T21:19:59Z) - RR-CP: Reliable-Region-Based Conformal Prediction for Trustworthy
Medical Image Classification [24.52922162675259]
Conformal prediction (CP) generates a set of predictions for a given test sample.
The size of the set indicates how certain the predictions are.
We propose a new method called Reliable-Region-Based Conformal Prediction (RR-CP)
arXiv Detail & Related papers (2023-09-09T11:14:04Z) - Few-Shot Calibration of Set Predictors via Meta-Learned
Cross-Validation-Based Conformal Prediction [33.33774397643919]
This paper introduces a novel meta-learning solution that aims at reducing the set prediction size.
It builds on cross-validation-based CP, rather than the less efficient validation-based CP.
It preserves formal per-task calibration guarantees, rather than less stringent task-marginal guarantees.
arXiv Detail & Related papers (2022-10-06T17:21:03Z) - Efficient Conformal Prediction via Cascaded Inference with Expanded
Admission [43.596058175459746]
We present a novel approach for conformal prediction (CP)
We aim to identify a set of promising prediction candidates -- in place of a single prediction.
This set is guaranteed to contain a correct answer with high probability.
arXiv Detail & Related papers (2020-07-06T23:13:07Z) - AutoCP: Automated Pipelines for Accurate Prediction Intervals [84.16181066107984]
This paper proposes an AutoML framework called Automatic Machine Learning for Conformal Prediction (AutoCP)
Unlike the familiar AutoML frameworks that attempt to select the best prediction model, AutoCP constructs prediction intervals that achieve the user-specified target coverage rate.
We tested AutoCP on a variety of datasets and found that it significantly outperforms benchmark algorithms.
arXiv Detail & Related papers (2020-06-24T23:13:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.