Learn then Test: Calibrating Predictive Algorithms to Achieve Risk
Control
- URL: http://arxiv.org/abs/2110.01052v1
- Date: Sun, 3 Oct 2021 17:42:03 GMT
- Title: Learn then Test: Calibrating Predictive Algorithms to Achieve Risk
Control
- Authors: Anastasios N. Angelopoulos and Stephen Bates and Emmanuel J. Cand\`es
and Michael I. Jordan and Lihua Lei
- Abstract summary: Learn then Test (LTT) is a framework for calibrating machine learning models.
Our main insight is to reframe the risk-control problem as multiple hypothesis testing.
We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision.
- Score: 67.52000805944924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Learn then Test (LTT), a framework for calibrating machine
learning models so that their predictions satisfy explicit, finite-sample
statistical guarantees regardless of the underlying model and (unknown)
data-generating distribution. The framework addresses, among other examples,
false discovery rate control in multi-label classification,
intersection-over-union control in instance segmentation, and the simultaneous
control of the type-1 error of outlier detection and confidence set coverage in
classification or regression. To accomplish this, we solve a key technical
challenge: the control of arbitrary risks that are not necessarily monotonic.
Our main insight is to reframe the risk-control problem as multiple hypothesis
testing, enabling techniques and mathematical arguments different from those in
the previous literature. We use our framework to provide new calibration
methods for several core machine learning tasks with detailed worked examples
in computer vision.
Related papers
- Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems.
We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework.
Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z) - Probabilistic Safety Regions Via Finite Families of Scalable Classifiers [2.431537995108158]
Supervised classification recognizes patterns in the data to separate classes of behaviours.
Canonical solutions contain misclassification errors that are intrinsic to the numerical approximating nature of machine learning.
We introduce the concept of probabilistic safety region to describe a subset of the input space in which the number of misclassified instances is probabilistically controlled.
arXiv Detail & Related papers (2023-09-08T22:40:19Z) - Prototypical Calibration for Few-shot Learning of Language Models [84.5759596754605]
GPT-like models have been recognized as fragile across different hand-crafted templates, and demonstration permutations.
We propose prototypical calibration to adaptively learn a more robust decision boundary for zero- and few-shot classification.
Our method calibrates the decision boundary as expected, greatly improving the robustness of GPT to templates, permutations, and class imbalance.
arXiv Detail & Related papers (2022-05-20T13:50:07Z) - Score-Based Change Detection for Gradient-Based Learning Machines [9.670556223243182]
We present a generic score-based change detection method that can detect a change in any number of components of a machine learning model trained via empirical risk minimization.
We establish the consistency of the hypothesis test and show how to calibrate it to achieve a prescribed false alarm rate.
arXiv Detail & Related papers (2021-06-27T01:38:11Z) - Adversarial Examples for Unsupervised Machine Learning Models [71.81480647638529]
Adrial examples causing evasive predictions are widely used to evaluate and improve the robustness of machine learning models.
We propose a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation.
arXiv Detail & Related papers (2021-03-02T17:47:58Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z) - Implicit supervision for fault detection and segmentation of emerging
fault types with Deep Variational Autoencoders [1.160208922584163]
We propose a variational autoencoder (VAE) with labeled and unlabeled samples while inducing implicit supervision on the latent representation of the healthy conditions.
This creates a compact and informative latent representation that allows good detection and segmentation of unseen fault types.
In an extensive comparison, we demonstrate that the proposed method outperforms other learning strategies.
arXiv Detail & Related papers (2019-12-28T18:40:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.