Related papers: Direct Prediction Set Minimization via Bilevel Conformal Classifier Training

Direct Prediction Set Minimization via Bilevel Conformal Classifier Training

URL: http://arxiv.org/abs/2506.06599v1
Date: Sat, 07 Jun 2025 00:19:00 GMT
Title: Direct Prediction Set Minimization via Bilevel Conformal Classifier Training
Authors: Yuanjie Shi, Hooman Shahrokhi, Xuesong Jia, Xiongzhi Chen, Janardhan Rao Doppa, Yan Yan,
Abstract summary: Conformal prediction (CP) is a promising uncertainty quantification framework which works as a wrapper around a black-box classifier.<n>Standard calibration methods for CP tend to produce large prediction sets which makes them less useful in practice.<n>This paper considers the problem of integrating conformal principles into the training process of deep classifiers to directly minimize the size of prediction sets.
Score: 22.513575498491544
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Conformal prediction (CP) is a promising uncertainty quantification framework which works as a wrapper around a black-box classifier to construct prediction sets (i.e., subset of candidate classes) with provable guarantees. However, standard calibration methods for CP tend to produce large prediction sets which makes them less useful in practice. This paper considers the problem of integrating conformal principles into the training process of deep classifiers to directly minimize the size of prediction sets. We formulate conformal training as a bilevel optimization problem and propose the {\em Direct Prediction Set Minimization (DPSM)} algorithm to solve it. The key insight behind DPSM is to minimize a measure of the prediction set size (upper level) that is conditioned on the learned quantile of conformity scores (lower level). We analyze that DPSM has a learning bound of $O(1/\sqrt{n})$ (with $n$ training samples), while prior conformal training methods based on stochastic approximation for the quantile has a bound of $\Omega(1/s)$ (with batch size $s$ and typically $s \ll \sqrt{n}$). Experiments on various benchmark datasets and deep models show that DPSM significantly outperforms the best prior conformal training baseline with $20.46\%\downarrow$ in the prediction set size and validates our theory.

Related papers

One Sample is Enough to Make Conformal Prediction Robust [53.78604391939934]
We show that conformal prediction attains some robustness even with a forward pass on a single randomly perturbed input.<n>Our approach returns robust sets with smaller average set size compared to SOTA methods which use many (e.g. around 100) passes per input.
arXiv Detail & Related papers (2025-06-19T19:14:25Z)
Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models [20.810300785340072]
Conformal Prediction with Query Oracle (CPQ) is a framework characterizing the optimal interplay between these objectives.<n>Our algorithm is built on two core principles: one governs the optimal query policy, and the other defines the optimal mapping from queried samples to prediction sets.
arXiv Detail & Related papers (2025-06-05T18:26:14Z)
Backward Conformal Prediction [49.1574468325115]
We introduce $textitBackward Conformal Prediction$, a method that guarantees conformal coverage while providing flexible control over the size of prediction sets.<n>Our approach defines a rule that constrains how prediction set sizes behave based on the observed data, and adapts the coverage level accordingly.<n>This approach is particularly useful in applications where large prediction sets are impractical such as medical diagnosis.
arXiv Detail & Related papers (2025-05-19T21:08:14Z)
Robust Conformal Prediction with a Single Binary Certificate [58.450154976190795]
Conformal prediction (CP) converts any model's output to prediction sets with a guarantee to cover the true label with (adjustable) high probability.<n>We propose a robust conformal prediction that produces smaller sets even with significantly lower MC samples.
arXiv Detail & Related papers (2025-03-07T08:41:53Z)
Volume Optimality in Conformal Prediction with Structured Prediction Sets [22.923139209762788]
Conformal Prediction is a widely studied technique to construct prediction sets of future observations.<n>We first prove an impossibility of volume optimality where any distribution-free method can only find a trivial solution.<n>We then introduce a new notion of volume optimality by restricting the prediction sets to belong to a set family.
arXiv Detail & Related papers (2025-02-23T17:31:33Z)
Smoothed Normalization for Efficient Distributed Private Optimization [54.197255548244705]
Federated learning enables machine learning models with privacy of participants.<n>There is no differentially private distributed method for training, non-feedback problems.<n>We introduce a new distributed algorithm $alpha$-$sf NormEC$ with provable convergence guarantees.
arXiv Detail & Related papers (2025-02-19T07:10:32Z)
Provably Robust Conformal Prediction with Improved Efficiency [29.70455766394585]
Conformal prediction is a powerful tool to generate uncertainty sets with guaranteed coverage. adversarial examples are able to manipulate conformal methods to construct prediction sets with invalid coverage rates. We propose two novel methods, Post-Training Transformation (PTT) and Robust Conformal Training (RCT), to effectively reduce prediction set size with little overhead.
arXiv Detail & Related papers (2024-04-30T15:49:01Z)
Minimum-Risk Recalibration of Classifiers [9.31067660373791]
We introduce the concept of minimum-risk recalibration within the framework of mean-squared-error decomposition. We show that transferring a calibrated classifier requires significantly fewer target samples compared to recalibrating from scratch.
arXiv Detail & Related papers (2023-05-18T11:27:02Z)
Conformal Nucleus Sampling [67.5232384936661]
We assess whether a top-$p$ set is indeed aligned with its probabilistic meaning in various linguistic contexts. We find that OPT models are overconfident, and that calibration shows a moderate inverse scaling with model size.
arXiv Detail & Related papers (2023-05-04T08:11:57Z)
Efficient and Differentiable Conformal Prediction with General Function Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters. We show that it achieves approximate valid population coverage and near-optimal efficiency within class. Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z)
Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.