Testing for Outliers with Conformal p-values
- URL: http://arxiv.org/abs/2104.08279v2
- Date: Mon, 19 Apr 2021 16:31:16 GMT
- Title: Testing for Outliers with Conformal p-values
- Authors: Stephen Bates, Emmanuel Cand\`es, Lihua Lei, Yaniv Romano, Matteo
Sesia
- Abstract summary: The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers.
We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually dependent for different test points.
We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense.
- Score: 14.158078752410182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies the construction of p-values for nonparametric outlier
detection, taking a multiple-testing perspective. The goal is to test whether
new independent samples belong to the same distribution as a reference data set
or are outliers. We propose a solution based on conformal inference, a broadly
applicable framework which yields p-values that are marginally valid but
mutually dependent for different test points. We prove these p-values are
positively dependent and enable exact false discovery rate control, although in
a relatively weak marginal sense. We then introduce a new method to compute
p-values that are both valid conditionally on the training data and independent
of each other for different test points; this paves the way to stronger type-I
error guarantees. Our results depart from classical conformal inference as we
leverage concentration inequalities rather than combinatorial arguments to
establish our finite-sample guarantees. Furthermore, our techniques also yield
a uniform confidence bound for the false positive rate of any outlier detection
algorithm, as a function of the threshold applied to its raw statistics.
Finally, the relevance of our results is demonstrated by numerical experiments
on real and simulated data.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Conditional Testing based on Localized Conformal p-values [5.6779147365057305]
We define the localized conformal p-values by inverting prediction intervals and prove their theoretical properties.
These defined p-values are then applied to several conditional testing problems to illustrate their practicality.
arXiv Detail & Related papers (2024-09-25T11:30:14Z) - Mitigating LLM Hallucinations via Conformal Abstention [70.83870602967625]
We develop a principled procedure for determining when a large language model should abstain from responding in a general domain.
We leverage conformal prediction techniques to develop an abstention procedure that benefits from rigorous theoretical guarantees on the hallucination rate (error rate)
Experimentally, our resulting conformal abstention method reliably bounds the hallucination rate on various closed-book, open-domain generative question answering datasets.
arXiv Detail & Related papers (2024-04-04T11:32:03Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Adaptive Conformal Prediction by Reweighting Nonconformity Score [0.0]
We use a Quantile Regression Forest (QRF) to learn the distribution of nonconformity scores and utilize the QRF's weights to assign more importance to samples with residuals similar to the test point.
Our approach enjoys an assumption-free finite sample marginal and training-conditional coverage, and under suitable assumptions, it also ensures conditional coverage.
arXiv Detail & Related papers (2023-03-22T16:42:19Z) - Derandomized Novelty Detection with FDR Control via Conformal E-values [20.864605211132663]
We propose to make conformal inferences more stable by leveraging suitable conformal e-values instead of p-values.
We show that the proposed method can reduce randomness without much loss of power compared to standard conformal inference.
arXiv Detail & Related papers (2023-02-14T19:21:44Z) - Integrative conformal p-values for powerful out-of-distribution testing
with labeled outliers [1.6371837018687636]
This paper develops novel conformal methods to test whether a new observation was sampled from the same distribution as a reference set.
The described methods can re-weight standard conformal p-values based on dependent side information from known out-of-distribution data.
The solution can be implemented either through sample splitting or via a novel transductive cross-validation+ scheme.
arXiv Detail & Related papers (2022-08-23T17:52:20Z) - Robust Flow-based Conformal Inference (FCI) with Statistical Guarantee [4.821312633849745]
We develop a series of conformal inference methods, including building predictive sets and inferring outliers for complex and high-dimensional data.
We evaluate our method, robust flow-based conformal inference, on benchmark datasets.
arXiv Detail & Related papers (2022-05-22T04:17:30Z) - Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes.
No nonparametric test of conditional local independence has been available.
We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z) - Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm.
Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.