Robust Conformal Outlier Detection under Contaminated Reference Data
- URL: http://arxiv.org/abs/2502.04807v1
- Date: Fri, 07 Feb 2025 10:23:25 GMT
- Title: Robust Conformal Outlier Detection under Contaminated Reference Data
- Authors: Meshi Bashari, Matteo Sesia, Yaniv Romano,
- Abstract summary: Conformal prediction is a flexible framework for calibrating machine learning predictions.
In outlier detection, this calibration relies on a reference set of labeled inlier data to control the type-I error rate.
This paper analyzes the impact of contamination on the validity of conformal methods.
- Score: 20.864605211132663
- License:
- Abstract: Conformal prediction is a flexible framework for calibrating machine learning predictions, providing distribution-free statistical guarantees. In outlier detection, this calibration relies on a reference set of labeled inlier data to control the type-I error rate. However, obtaining a perfectly labeled inlier reference set is often unrealistic, and a more practical scenario involves access to a contaminated reference set containing a small fraction of outliers. This paper analyzes the impact of such contamination on the validity of conformal methods. We prove that under realistic, non-adversarial settings, calibration on contaminated data yields conservative type-I error control, shedding light on the inherent robustness of conformal methods. This conservativeness, however, typically results in a loss of power. To alleviate this limitation, we propose a novel, active data-cleaning framework that leverages a limited labeling budget and an outlier detection model to selectively annotate data points in the contaminated reference set that are suspected as outliers. By removing only the annotated outliers in this ``suspicious'' subset, we can effectively enhance power while mitigating the risk of inflating the type-I error rate, as supported by our theoretical analysis. Experiments on real datasets validate the conservative behavior of conformal methods under contamination and show that the proposed data-cleaning strategy improves power without sacrificing validity.
Related papers
- Noise-Adaptive Conformal Classification with Marginal Coverage [53.74125453366155]
We introduce an adaptive conformal inference method capable of efficiently handling deviations from exchangeability caused by random label noise.
We validate our method through extensive numerical experiments demonstrating its effectiveness on synthetic and real data sets.
arXiv Detail & Related papers (2025-01-29T23:55:23Z) - Semi-Supervised Risk Control via Prediction-Powered Inference [14.890609936348277]
Risk-controlling prediction sets (RCPS) is a tool for transforming the output of any machine learning model to design a predictive rule with rigorous error rate control.
We introduce a semi-supervised calibration procedure that leverages unlabeled data to rigorously tune the hyper- parameter.
Our procedure builds upon the prediction-powered inference framework, carefully tailoring it to risk-controlling tasks.
arXiv Detail & Related papers (2024-12-15T13:00:23Z) - Adaptive Conformal Inference by Particle Filtering under Hidden Markov Models [8.505262415500168]
This paper proposes an adaptive conformal inference framework that leverages a particle filtering approach to address this issue.
Rather than directly focusing on the unobservable hidden state, we innovatively use weighted particles as an approximation of the actual posterior distribution of the hidden state.
arXiv Detail & Related papers (2024-11-03T13:15:32Z) - Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning [53.42244686183879]
Conformal prediction provides model-agnostic and distribution-free uncertainty quantification.
Yet, conformal prediction is not reliable under poisoning attacks where adversaries manipulate both training and calibration data.
We propose reliable prediction sets (RPS): the first efficient method for constructing conformal prediction sets with provable reliability guarantees under poisoning.
arXiv Detail & Related papers (2024-10-13T15:37:11Z) - Towards Certification of Uncertainty Calibration under Adversarial Attacks [96.48317453951418]
We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations.
We propose novel calibration attacks and demonstrate how they can improve model calibration through textitadversarial calibration training
arXiv Detail & Related papers (2024-05-22T18:52:09Z) - Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors [0.0]
In this work, we formally define and evaluate leave-one-out-, bootstrap-, and cross-conformal methods for anomaly detection.
We demonstrate that derived methods for calculating resampling-conformal $p$-values strike a practical compromise between statistical efficiency (full-conformal) and computational efficiency (split-conformal) as they make more efficient use of available data.
arXiv Detail & Related papers (2024-02-26T08:22:40Z) - Adaptive conformal classification with noisy labels [22.33857704379073]
The paper develops novel conformal prediction methods for classification tasks that can automatically adapt to random label contamination in the calibration sample.
This is made possible by a precise characterization of the effective coverage inflation suffered by standard conformal inferences in the presence of label contamination.
The advantages of the proposed methods are demonstrated through extensive simulations and an application to object classification with the CIFAR-10H image data set.
arXiv Detail & Related papers (2023-09-10T17:35:43Z) - Approximate Conditional Coverage via Neural Model Approximations [0.030458514384586396]
We analyze a data-driven procedure for obtaining empirically reliable approximate conditional coverage.
We demonstrate the potential for substantial (and otherwise unknowable) under-coverage with split-conformal alternatives with marginal coverage guarantees.
arXiv Detail & Related papers (2022-05-28T02:59:05Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - RATT: Leveraging Unlabeled Data to Guarantee Generalization [96.08979093738024]
We introduce a method that leverages unlabeled data to produce generalization bounds.
We prove that our bound is valid for 0-1 empirical risk minimization.
This work provides practitioners with an option for certifying the generalization of deep nets even when unseen labeled data is unavailable.
arXiv Detail & Related papers (2021-05-01T17:05:29Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.