Tracking the risk of a deployed model and detecting harmful distribution
  shifts
        - URL: http://arxiv.org/abs/2110.06177v1
- Date: Tue, 12 Oct 2021 17:21:41 GMT
- Title: Tracking the risk of a deployed model and detecting harmful distribution
  shifts
- Authors: Aleksandr Podkopaev, Aaditya Ramdas
- Abstract summary: In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially.
We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
- Score: 105.27463615756733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   When deployed in the real world, machine learning models inevitably encounter
changes in the data distribution, and certain -- but not all -- distribution
shifts could result in significant performance degradation. In practice, it may
make sense to ignore benign shifts, under which the performance of a deployed
model does not degrade substantially, making interventions by a human expert
(or model retraining) unnecessary. While several works have developed tests for
distribution shifts, these typically either use non-sequential methods, or
detect arbitrary shifts (benign or harmful), or both. We argue that a sensible
method for firing off a warning has to both (a) detect harmful shifts while
ignoring benign ones, and (b) allow continuous monitoring of model performance
without increasing the false alarm rate. In this work, we design simple
sequential tools for testing if the difference between source (training) and
target (test) distributions leads to a significant drop in a risk function of
interest, like accuracy or calibration. Recent advances in constructing
time-uniform confidence sequences allow efficient aggregation of statistical
evidence accumulated during the tracking process. The designed framework is
applicable in settings where (some) true labels are revealed after the
prediction is performed, or when batches of labels become available in a
delayed fashion. We demonstrate the efficacy of the proposed framework through
an extensive empirical study on a collection of simulated and real datasets.
 
      
        Related papers
        - Reliably detecting model failures in deployment without labels [10.006585036887929]
 This paper formalizes and addresses the problem of post-deployment deterioration (PDD) monitoring.<n>We propose D3M, a practical and efficient monitoring algorithm based on the disagreement of predictive models.<n> Empirical results on both standard benchmark and a real-world large-scale internal medicine dataset demonstrate the effectiveness of the framework.
 arXiv  Detail & Related papers  (2025-06-05T13:56:18Z)
- Sequential Harmful Shift Detection Without Labels [18.465525086385284]
 We introduce a novel approach for detecting distribution shifts that negatively impact the performance of machine learning models in continuous production environments.
It builds upon the work of Podkopaev and Ramdas [2022], who address scenarios where labels are available for tracking model errors over time.
Our solution extends this framework to work in the absence of labels, by employing a proxy for the true error.
 arXiv  Detail & Related papers  (2024-12-17T13:37:48Z)
- A Learning Based Hypothesis Test for Harmful Covariate Shift [3.1406146587437904]
 Machine learning systems in high-risk domains need to identify when predictions should not be made on out-of-distribution test examples.
In this work, we use the discordance between an ensemble of classifiers trained to agree on training data and disagree on test data to determine when a model should be removed from the deployment setting.
 arXiv  Detail & Related papers  (2022-12-06T04:15:24Z)
- Online Distribution Shift Detection via Recency Prediction [43.84609690251748]
 We present an online method for detecting distribution shift with guarantees on the false positive rate.
Our system is very unlikely (with probability $ epsilon$) to falsely issue an alert when there is no distribution shift.
It empirically achieves up to 11x faster detection on realistic robotics settings compared to prior work.
 arXiv  Detail & Related papers  (2022-11-17T22:29:58Z)
- Uncertainty Modeling for Out-of-Distribution Generalization [56.957731893992495]
 We argue that the feature statistics can be properly manipulated to improve the generalization ability of deep learning models.
Common methods often consider the feature statistics as deterministic values measured from the learned features.
We improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training.
 arXiv  Detail & Related papers  (2022-02-08T16:09:12Z)
- Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
 We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution.
We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
 arXiv  Detail & Related papers  (2022-01-28T22:03:50Z)
- Monitoring Model Deterioration with Explainable Uncertainty Estimation
  via Non-parametric Bootstrap [0.0]
 Monitoring machine learning models once they are deployed is challenging.
It is even more challenging to decide when to retrain models in real-case scenarios when labeled data is beyond reach.
In this work, we use non-parametric bootstrapped uncertainty estimates and SHAP values to provide explainable uncertainty estimation.
 arXiv  Detail & Related papers  (2022-01-27T17:23:04Z)
- Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
 Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
 arXiv  Detail & Related papers  (2022-01-11T23:01:12Z)
- Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
 Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
 arXiv  Detail & Related papers  (2021-09-27T01:09:08Z)
- Predicting with Confidence on Unseen Distributions [90.68414180153897]
 We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
 arXiv  Detail & Related papers  (2021-07-07T15:50:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.