Conformal Inference for Open-Set and Imbalanced Classification
- URL: http://arxiv.org/abs/2510.13037v1
- Date: Tue, 14 Oct 2025 23:19:06 GMT
- Title: Conformal Inference for Open-Set and Imbalanced Classification
- Authors: Tianmin Xie, Yanfei Zhou, Ziyi Liang, Stefano Favaro, Matteo Sesia,
- Abstract summary: This paper presents a conformal prediction method for classification in highly imbalanced and open-set settings.<n>Existing approaches require a finite, known label space and typically involve random sample splitting.<n>We compute and integrate into our predictions a new family of conformal p-values that can test whether a new data point belongs to a previously unseen class.
- Score: 17.863428471982967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a conformal prediction method for classification in highly imbalanced and open-set settings, where there are many possible classes and not all may be represented in the data. Existing approaches require a finite, known label space and typically involve random sample splitting, which works well when there is a sufficient number of observations from each class. Consequently, they have two limitations: (i) they fail to provide adequate coverage when encountering new labels at test time, and (ii) they may become overly conservative when predicting previously seen labels. To obtain valid prediction sets in the presence of unseen labels, we compute and integrate into our predictions a new family of conformal p-values that can test whether a new data point belongs to a previously unseen class. We study these p-values theoretically, establishing their optimality, and uncover an intriguing connection with the classical Good--Turing estimator for the probability of observing a new species. To make more efficient use of imbalanced data, we also develop a selective sample splitting algorithm that partitions training and calibration data based on label frequency, leading to more informative predictions. Despite breaking exchangeability, this allows maintaining finite-sample guarantees through suitable re-weighting. With both simulated and real data, we demonstrate our method leads to prediction sets with valid coverage even in challenging open-set scenarios with infinite numbers of possible labels, and produces more informative predictions under extreme class imbalance.
Related papers
- Sparse Activations as Conformal Predictors [19.298282860984116]
We find a novel connection between conformal prediction and sparse softmax-like transformations.<n>We introduce new non-conformity scores for classification that make the calibration process correspond to the widely used temperature scaling method.<n>We show that the proposed method achieves competitive results in terms of coverage, efficiency, and adaptiveness.
arXiv Detail & Related papers (2025-02-20T17:53:41Z) - Conformal Prediction Sets with Improved Conditional Coverage using Trust Scores [52.92618442300405]
It is impossible to achieve exact, distribution-free conditional coverage in finite samples.<n>We propose an alternative conformal prediction algorithm that targets coverage where it matters most.
arXiv Detail & Related papers (2025-01-17T12:01:56Z) - Probably Approximately Precision and Recall Learning [60.00180898830079]
A key challenge in machine learning is the prevalence of one-sided feedback.<n>We introduce a Probably Approximately Correct (PAC) framework in which hypotheses are set functions that map each input to a set of labels.<n>We develop new algorithms that learn from positive data alone, achieving optimal sample complexity in the realizable case.
arXiv Detail & Related papers (2024-11-20T04:21:07Z) - Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning [53.42244686183879]
Conformal prediction provides model-agnostic and distribution-free uncertainty quantification.<n>Yet, conformal prediction is not reliable under poisoning attacks where adversaries manipulate both training and calibration data.<n>We propose reliable prediction sets (RPS): the first efficient method for constructing conformal prediction sets with provable reliability guarantees under poisoning.
arXiv Detail & Related papers (2024-10-13T15:37:11Z) - Augmented prediction of a true class for Positive Unlabeled data under selection bias [0.8594140167290099]
We introduce a new observational setting for Positive Unlabeled (PU) data where the observations at prediction time are also labeled.
We argue that the additional information is important for prediction, and call this task "augmented PU prediction"
We introduce several variants of the empirical Bayes rule in such scenario and investigate their performance.
arXiv Detail & Related papers (2024-07-14T19:58:01Z) - Stochastic Online Conformal Prediction with Semi-Bandit Feedback [29.334511328067777]
We consider the online learning setting, where examples arrive over time, and the goal is to construct prediction sets dynamically.<n>We propose a novel conformal prediction algorithm targeted at this setting, and prove that it obtains sublinear regret compared to the optimal conformal predictor.
arXiv Detail & Related papers (2024-05-22T00:42:49Z) - PAC Prediction Sets Under Label Shift [52.30074177997787]
Prediction sets capture uncertainty by predicting sets of labels rather than individual labels.
We propose a novel algorithm for constructing prediction sets with PAC guarantees in the label shift setting.
We evaluate our approach on five datasets.
arXiv Detail & Related papers (2023-10-19T17:57:57Z) - Class-Conditional Conformal Prediction with Many Classes [60.8189977620604]
We propose a method called clustered conformal prediction that clusters together classes having "similar" conformal scores.
We find that clustered conformal typically outperforms existing methods in terms of class-conditional coverage and set size metrics.
arXiv Detail & Related papers (2023-06-15T17:59:02Z) - Practical Adversarial Multivalid Conformal Prediction [27.179891682629183]
We give a generic conformal prediction method for sequential prediction.
It achieves target empirical coverage guarantees against adversarially chosen data.
It is computationally lightweight -- comparable to split conformal prediction.
arXiv Detail & Related papers (2022-06-02T14:33:00Z) - Approximate Conditional Coverage via Neural Model Approximations [0.030458514384586396]
We analyze a data-driven procedure for obtaining empirically reliable approximate conditional coverage.
We demonstrate the potential for substantial (and otherwise unknowable) under-coverage with split-conformal alternatives with marginal coverage guarantees.
arXiv Detail & Related papers (2022-05-28T02:59:05Z) - Distribution-free uncertainty quantification for classification under
label shift [105.27463615756733]
We focus on uncertainty quantification (UQ) for classification problems via two avenues.
We first argue that label shift hurts UQ, by showing degradation in coverage and calibration.
We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.
arXiv Detail & Related papers (2021-03-04T20:51:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.