Suitability of Different Metric Choices for Concept Drift Detection
- URL: http://arxiv.org/abs/2202.09486v1
- Date: Sat, 19 Feb 2022 01:11:32 GMT
- Title: Suitability of Different Metric Choices for Concept Drift Detection
- Authors: Fabian Hinder, Valerie Vaquet, Barbara Hammer
- Abstract summary: Many unsupervised approaches for drift detection rely on measuring the discrepancy between the sample of two time windows.
Most drift detection methods can be distinguished in what metric they use, how this metric is estimated, and how the decision threshold is found.
We compare different types of estimators and metrics theoretically and empirically and investigate the relevance of the single metric components.
- Score: 9.76294323004155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The notion of concept drift refers to the phenomenon that the distribution,
which is underlying the observed data, changes over time; as a consequence
machine learning models may become inaccurate and need adjustment. Many
unsupervised approaches for drift detection rely on measuring the discrepancy
between the sample distributions of two time windows. This may be done
directly, after some preprocessing (feature extraction, embedding into a latent
space, etc.), or with respect to inferred features (mean, variance, conditional
probabilities etc.). Most drift detection methods can be distinguished in what
metric they use, how this metric is estimated, and how the decision threshold
is found. In this paper, we analyze structural properties of the drift induced
signals in the context of different metrics. We compare different types of
estimators and metrics theoretically and empirically and investigate the
relevance of the single metric components. In addition, we propose new choices
and demonstrate their suitability in several experiments.
Related papers
- CADM: Confusion Model-based Detection Method for Real-drift in Chunk
Data Stream [3.0885191226198785]
Concept drift detection has attracted considerable attention due to its importance in many real-world applications such as health monitoring and fault diagnosis.
We propose a new approach to detect real-drift in the chunk data stream with limited annotations based on concept confusion.
arXiv Detail & Related papers (2023-03-25T08:59:27Z) - Detecting Concept Drift in the Presence of Sparsity -- A Case Study of
Automated Change Risk Assessment System [0.8021979227281782]
Missing values, widely called as textitsparsity in literature, is a common characteristic of many real-world datasets.
We study different patterns of missing values, various statistical and ML based data imputation methods for different kinds of sparsity.
We then select the best concept drift detector given a dataset with missing values based on the different metrics.
arXiv Detail & Related papers (2022-07-27T04:27:49Z) - Precise Change Point Detection using Spectral Drift Detection [8.686667049158476]
Concept drift refers to the phenomenon that the data generating distribution changes over time; as a consequence machine learning models may become inaccurate and need adjustment.
In this paper we consider the problem of detecting those change points in unsupervised learning.
We derive a new unsupervised drift detection algorithm, investigate its mathematical properties, and demonstrate its usefulness in several experiments.
arXiv Detail & Related papers (2022-05-13T08:31:47Z) - Context-Aware Drift Detection [0.0]
Two-sample tests of homogeneity form the foundation upon which existing approaches to drift detection build.
We develop a more general drift detection framework built upon a foundation of two-sample tests for conditional distributional treatment effects.
arXiv Detail & Related papers (2022-03-16T14:23:02Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic
Uncertainty [58.144520501201995]
Bi-Lipschitz regularization of neural network layers preserve relative distances between data instances in the feature spaces of each layer.
With the use of an attentive set encoder, we propose to meta learn either diagonal or diagonal plus low-rank factors to efficiently construct task specific covariance matrices.
We also propose an inference procedure which utilizes scaled energy to achieve a final predictive distribution.
arXiv Detail & Related papers (2021-10-12T22:04:19Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Toward Scalable and Unified Example-based Explanation and Outlier
Detection [128.23117182137418]
We argue for a broader adoption of prototype-based student networks capable of providing an example-based explanation for their prediction.
We show that our prototype-based networks beyond similarity kernels deliver meaningful explanations and promising outlier detection results without compromising classification accuracy.
arXiv Detail & Related papers (2020-11-11T05:58:17Z) - Concept Drift Detection: Dealing with MissingValues via Fuzzy Distance
Estimations [40.77597229122878]
In data streams, the data distribution of arriving observations at different time points may change - a phenomenon called concept drift.
We show that missing values exert a profound impact on concept drift detection, but using fuzzy set theory to model observations can produce more reliable results than imputation.
arXiv Detail & Related papers (2020-08-09T05:25:46Z) - Learning Disentangled Representations with Latent Variation
Predictability [102.4163768995288]
This paper defines the variation predictability of latent disentangled representations.
Within an adversarial generation process, we encourage variation predictability by maximizing the mutual information between latent variations and corresponding image pairs.
We develop an evaluation metric that does not rely on the ground-truth generative factors to measure the disentanglement of latent representations.
arXiv Detail & Related papers (2020-07-25T08:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.