Context-Aware Drift Detection
- URL: http://arxiv.org/abs/2203.08644v1
- Date: Wed, 16 Mar 2022 14:23:02 GMT
- Title: Context-Aware Drift Detection
- Authors: Oliver Cobb and Arnaud Van Looveren
- Abstract summary: Two-sample tests of homogeneity form the foundation upon which existing approaches to drift detection build.
We develop a more general drift detection framework built upon a foundation of two-sample tests for conditional distributional treatment effects.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When monitoring machine learning systems, two-sample tests of homogeneity
form the foundation upon which existing approaches to drift detection build.
They are used to test for evidence that the distribution underlying recent
deployment data differs from that underlying the historical reference data.
Often, however, various factors such as time-induced correlation mean that
batches of recent deployment data are not expected to form an i.i.d. sample
from the historical data distribution. Instead we may wish to test for
differences in the distributions conditional on \textit{context} that is
permitted to change. To facilitate this we borrow machinery from the causal
inference domain to develop a more general drift detection framework built upon
a foundation of two-sample tests for conditional distributional treatment
effects. We recommend a particular instantiation of the framework based on
maximum conditional mean discrepancies. We then provide an empirical study
demonstrating its effectiveness for various drift detection problems of
practical interest, such as detecting drift in the distributions underlying
subpopulations of data in a manner that is insensitive to their respective
prevalences. The study additionally demonstrates applicability to
ImageNet-scale vision problems.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Fault Detection and Monitoring using an Information-Driven Strategy: Method, Theory, and Application [5.056456697289351]
We propose an information-driven fault detection method based on a novel concept drift detector.
The method is tailored to identifying drifts in input-output relationships of additive noise models.
We prove several theoretical properties of the proposed MI-based fault detection scheme.
arXiv Detail & Related papers (2024-05-06T17:43:39Z) - On the Detection of Anomalous or Out-Of-Distribution Data in Vision Models Using Statistical Techniques [0.6554326244334868]
We assess a tool, Benford's law, as a method used to quantify the difference between real and corrupted inputs.
In many settings, it could function as a filter for anomalous data points and for signalling out-of-distribution data.
arXiv Detail & Related papers (2024-03-21T18:31:47Z) - DAGnosis: Localized Identification of Data Inconsistencies using
Structures [73.39285449012255]
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models.
We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure.
Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
arXiv Detail & Related papers (2024-02-26T11:29:16Z) - DIVERSIFY: A General Framework for Time Series Out-of-distribution
Detection and Generalization [58.704753031608625]
Time series is one of the most challenging modalities in machine learning research.
OOD detection and generalization on time series tend to suffer due to its non-stationary property.
We propose DIVERSIFY, a framework for OOD detection and generalization on dynamic distributions of time series.
arXiv Detail & Related papers (2023-08-04T12:27:11Z) - Sequential Predictive Two-Sample and Independence Testing [114.4130718687858]
We study the problems of sequential nonparametric two-sample and independence testing.
We build upon the principle of (nonparametric) testing by betting.
arXiv Detail & Related papers (2023-04-29T01:30:33Z) - Anomaly Detection under Distribution Shift [24.094884041252044]
Anomaly detection (AD) is a crucial machine learning task that aims to learn patterns from a set of normal training samples to identify abnormal samples in test data.
Most existing AD studies assume that the training and test data are drawn from the same data distribution, but the test data can have large distribution shifts.
We introduce a novel robust AD approach to diverse distribution shifts by minimizing the distribution gap between in-distribution and OOD normal samples in both the training and inference stages.
arXiv Detail & Related papers (2023-03-24T07:39:08Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - Discovering Distribution Shifts using Latent Space Representations [4.014524824655106]
It is non-trivial to assess a model's generalizability to new, candidate datasets.
We use embedding space geometry to propose a non-parametric framework for detecting distribution shifts.
arXiv Detail & Related papers (2022-02-04T19:00:16Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Partial Wasserstein and Maximum Mean Discrepancy distances for bridging
the gap between outlier detection and drift detection [0.0]
An important aspect of monitoring is to check whether the inputs have strayed from the distribution they were validated for.
We bridge the gap between outlier detection and drift detection through comparing a given number of inputs to an automatically chosen part of the reference distribution.
arXiv Detail & Related papers (2021-06-09T18:49:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.