Max-Rank: Efficient Multiple Testing for Conformal Prediction
- URL: http://arxiv.org/abs/2311.10900v3
- Date: Thu, 31 Oct 2024 13:50:41 GMT
- Title: Max-Rank: Efficient Multiple Testing for Conformal Prediction
- Authors: Alexander Timans, Christoph-Nikolas Straehle, Kaspar Sakmann, Christian A. Naesseth, Eric Nalisnick,
- Abstract summary: Multiple hypothesis testing (MHT) commonly arises in various scientific fields, from genomics to psychology, where testing many hypotheses simultaneously increases the risk of Type-I errors.
We propose a novel correction named $textttmax-rank$ that leverages these dependencies, whilst ensuring that the joint Type-I error rate is efficiently controlled.
- Score: 43.56898111853698
- License:
- Abstract: Multiple hypothesis testing (MHT) commonly arises in various scientific fields, from genomics to psychology, where testing many hypotheses simultaneously increases the risk of Type-I errors. These errors can mislead scientific inquiry, rendering MHT corrections essential. In this paper, we address MHT within the context of conformal prediction, a flexible method for predictive uncertainty quantification. Some conformal prediction settings can require simultaneous testing, and positive dependencies among tests typically exist. We propose a novel correction named $\texttt{max-rank}$ that leverages these dependencies, whilst ensuring that the joint Type-I error rate is efficiently controlled. Inspired by permutation-based corrections from Westfall & Young (1993), $\texttt{max-rank}$ exploits rank order information to improve performance, and readily integrates with any conformal procedure. We demonstrate both its theoretical and empirical advantages over the common Bonferroni correction and its compatibility with conformal prediction, highlighting the potential to enhance predictive uncertainty quantification.
Related papers
- FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees [41.78390564658645]
Large Language Models (LLMs) to generate hallucinations and non-factual content undermines their reliability in high-stakes domains.
We introduce FactTest, a novel framework that statistically assesses whether a LLM can confidently provide correct answers to given questions.
We show that FactTest effectively detects hallucinations and improves the model's ability to abstain from answering unknown questions, leading to an over 40% accuracy improvement.
arXiv Detail & Related papers (2024-11-04T20:53:04Z) - Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Source-Free Unsupervised Domain Adaptation with Hypothesis Consolidation
of Prediction Rationale [53.152460508207184]
Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data.
This paper proposes a novel approach that considers multiple prediction hypotheses for each sample and investigates the rationale behind each hypothesis.
To achieve the optimal performance, we propose a three-step adaptation process: model pre-adaptation, hypothesis consolidation, and semi-supervised learning.
arXiv Detail & Related papers (2024-02-02T05:53:22Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - A Semi-Bayesian Nonparametric Estimator of the Maximum Mean Discrepancy
Measure: Applications in Goodness-of-Fit Testing and Generative Adversarial
Networks [3.623570119514559]
We propose a semi-Bayesian nonparametric (semi-BNP) procedure for the goodness-of-fit (GOF) test.
Our method introduces a novel Bayesian estimator for the maximum mean discrepancy (MMD) measure.
We demonstrate that our proposed test outperforms frequentist MMD-based methods by achieving a lower false rejection and acceptance rate of the null hypothesis.
arXiv Detail & Related papers (2023-03-05T10:36:21Z) - Sequential Permutation Testing of Random Forest Variable Importance
Measures [68.8204255655161]
It is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs associated with conventional permutation tests.
The results of simulation studies confirm that the theoretical properties of the sequential tests apply.
The numerical stability of the methods is investigated in two additional application studies.
arXiv Detail & Related papers (2022-06-02T20:16:50Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Addressing Maximization Bias in Reinforcement Learning with Two-Sample Testing [0.0]
Overestimation bias is a known threat to value-based reinforcement-learning algorithms.
We propose a $T$-Estimator (TE) based on two-sample testing for the mean, that flexibly interpolates between over- and underestimation.
We also introduce a generalization, termed $K$-Estimator (KE), that obeys the same bias and variance bounds as the TE.
arXiv Detail & Related papers (2022-01-20T09:22:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.