Detecting Concept Drift in the Presence of Sparsity -- A Case Study of
Automated Change Risk Assessment System
- URL: http://arxiv.org/abs/2207.13287v1
- Date: Wed, 27 Jul 2022 04:27:49 GMT
- Title: Detecting Concept Drift in the Presence of Sparsity -- A Case Study of
Automated Change Risk Assessment System
- Authors: Vishwas Choudhary, Binay Gupta, Anirban Chatterjee, Subhadip Paul,
Kunal Banerjee, Vijay Agneeswaran
- Abstract summary: Missing values, widely called as textitsparsity in literature, is a common characteristic of many real-world datasets.
We study different patterns of missing values, various statistical and ML based data imputation methods for different kinds of sparsity.
We then select the best concept drift detector given a dataset with missing values based on the different metrics.
- Score: 0.8021979227281782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Missing values, widely called as \textit{sparsity} in literature, is a common
characteristic of many real-world datasets. Many imputation methods have been
proposed to address this problem of data incompleteness or sparsity. However,
the accuracy of a data imputation method for a given feature or a set of
features in a dataset is highly dependent on the distribution of the feature
values and its correlation with other features. Another problem that plagues
industry deployments of machine learning (ML) solutions is concept drift
detection, which becomes more challenging in the presence of missing values.
Although data imputation and concept drift detection have been studied
extensively, little work has attempted a combined study of the two phenomena,
i.e., concept drift detection in the presence of sparsity. In this work, we
carry out a systematic study of the following: (i) different patterns of
missing values, (ii) various statistical and ML based data imputation methods
for different kinds of sparsity, (iii) several concept drift detection methods,
(iv) practical analysis of the various drift detection metrics, (v) selecting
the best concept drift detector given a dataset with missing values based on
the different metrics. We first analyze it on synthetic data and publicly
available datasets, and finally extend the findings to our deployed solution of
automated change risk assessment system. One of the major findings from our
empirical study is the absence of supremacy of any one concept drift detection
method across all the relevant metrics. Therefore, we adopt a majority voting
based ensemble of concept drift detectors for abrupt and gradual concept
drifts. Our experiments show optimal or near optimal performance can be
achieved for this ensemble method across all the metrics.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Online Drift Detection with Maximum Concept Discrepancy [13.48123472458282]
We propose MCD-DD, a novel concept drift detection method based on maximum concept discrepancy.
Our method can adaptively identify varying forms of concept drift by contrastive learning of concept embeddings.
arXiv Detail & Related papers (2024-07-07T13:57:50Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - A Neighbor-Searching Discrepancy-based Drift Detection Scheme for Learning Evolving Data [40.00357483768265]
This work presents a novel real concept drift detection method based on Neighbor-Searching Discrepancy.
The proposed method is able to detect real concept drift with high accuracy while ignoring virtual drift.
It can also indicate the direction of the classification boundary change by identifying the invasion or retreat of a certain class.
arXiv Detail & Related papers (2024-05-23T04:03:36Z) - Fault Detection and Monitoring using an Information-Driven Strategy: Method, Theory, and Application [5.056456697289351]
We propose an information-driven fault detection method based on a novel concept drift detector.
The method is tailored to identifying drifts in input-output relationships of additive noise models.
We prove several theoretical properties of the proposed MI-based fault detection scheme.
arXiv Detail & Related papers (2024-05-06T17:43:39Z) - Towards stable real-world equation discovery with assessing
differentiating quality influence [52.2980614912553]
We propose alternatives to the commonly used finite differences-based method.
We evaluate these methods in terms of applicability to problems, similar to the real ones, and their ability to ensure the convergence of equation discovery algorithms.
arXiv Detail & Related papers (2023-11-09T23:32:06Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Detecting Concept Drift With Neural Network Model Uncertainty [0.0]
Uncertainty Drift Detection (UDD) is able to detect drifts without access to true labels.
In contrast to input data-based drift detection, our approach considers the effects of the current input data on the properties of the prediction model.
We show that UDD outperforms other state-of-the-art strategies on two synthetic as well as ten real-world data sets for both regression and classification tasks.
arXiv Detail & Related papers (2021-07-05T08:56:36Z) - Meta-learning One-class Classifiers with Eigenvalue Solvers for
Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection.
We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z) - Concept Drift Detection: Dealing with MissingValues via Fuzzy Distance
Estimations [40.77597229122878]
In data streams, the data distribution of arriving observations at different time points may change - a phenomenon called concept drift.
We show that missing values exert a profound impact on concept drift detection, but using fuzzy set theory to model observations can produce more reliable results than imputation.
arXiv Detail & Related papers (2020-08-09T05:25:46Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.