Related papers: Detecting Concept Drift in the Presence of Sparsity -- A Case Study of Automated Change Risk Assessment System

Detecting Concept Drift in the Presence of Sparsity -- A Case Study of Automated Change Risk Assessment System

URL: http://arxiv.org/abs/2207.13287v1
Date: Wed, 27 Jul 2022 04:27:49 GMT
Title: Detecting Concept Drift in the Presence of Sparsity -- A Case Study of Automated Change Risk Assessment System
Authors: Vishwas Choudhary, Binay Gupta, Anirban Chatterjee, Subhadip Paul, Kunal Banerjee, Vijay Agneeswaran
Abstract summary: Missing values, widely called as textitsparsity in literature, is a common characteristic of many real-world datasets. We study different patterns of missing values, various statistical and ML based data imputation methods for different kinds of sparsity. We then select the best concept drift detector given a dataset with missing values based on the different metrics.
Score: 0.8021979227281782
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Missing values, widely called as \textit{sparsity} in literature, is a common characteristic of many real-world datasets. Many imputation methods have been proposed to address this problem of data incompleteness or sparsity. However, the accuracy of a data imputation method for a given feature or a set of features in a dataset is highly dependent on the distribution of the feature values and its correlation with other features. Another problem that plagues industry deployments of machine learning (ML) solutions is concept drift detection, which becomes more challenging in the presence of missing values. Although data imputation and concept drift detection have been studied extensively, little work has attempted a combined study of the two phenomena, i.e., concept drift detection in the presence of sparsity. In this work, we carry out a systematic study of the following: (i) different patterns of missing values, (ii) various statistical and ML based data imputation methods for different kinds of sparsity, (iii) several concept drift detection methods, (iv) practical analysis of the various drift detection metrics, (v) selecting the best concept drift detector given a dataset with missing values based on the different metrics. We first analyze it on synthetic data and publicly available datasets, and finally extend the findings to our deployed solution of automated change risk assessment system. One of the major findings from our empirical study is the absence of supremacy of any one concept drift detection method across all the relevant metrics. Therefore, we adopt a majority voting based ensemble of concept drift detectors for abrupt and gradual concept drifts. Our experiments show optimal or near optimal performance can be achieved for this ensemble method across all the metrics.

Related papers

A Dataset for Semantic Segmentation in the Presence of Unknowns [49.795683850385956]
Existing datasets allow evaluation of only knowns or unknowns - but not both. We propose a novel anomaly segmentation dataset, ISSU, that features a diverse set of anomaly inputs from cluttered real-world environments. The dataset is twice larger than existing anomaly segmentation datasets.
arXiv Detail & Related papers (2025-03-28T10:31:01Z)
Out-of-Distribution Detection on Graphs: A Survey [58.47395497985277]
Graph out-of-distribution (GOOD) detection focuses on identifying graph data that deviates from the distribution seen during training. We categorize existing methods into four types: enhancement-based, reconstruction-based, information propagation-based, and classification-based approaches. We discuss practical applications and theoretical foundations, highlighting the unique challenges posed by graph data.
arXiv Detail & Related papers (2025-02-12T04:07:12Z)
Oriented Tiny Object Detection: A Dataset, Benchmark, and Dynamic Unbiased Learning [51.170479006249195]
We introduce a new dataset, benchmark, and a dynamic coarse-to-fine learning scheme in this study. Our proposed dataset, AI-TOD-R, features the smallest object sizes among all oriented object detection datasets. We present a benchmark spanning a broad range of detection paradigms, including both fully-supervised and label-efficient approaches.
arXiv Detail & Related papers (2024-12-16T09:14:32Z)
Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance. DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator. Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z)
Online Drift Detection with Maximum Concept Discrepancy [13.48123472458282]
We propose MCD-DD, a novel concept drift detection method based on maximum concept discrepancy. Our method can adaptively identify varying forms of concept drift by contrastive learning of concept embeddings.
arXiv Detail & Related papers (2024-07-07T13:57:50Z)
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z)
A Neighbor-Searching Discrepancy-based Drift Detection Scheme for Learning Evolving Data [40.00357483768265]
This work presents a novel real concept drift detection method based on Neighbor-Searching Discrepancy. The proposed method is able to detect real concept drift with high accuracy while ignoring virtual drift. It can also indicate the direction of the classification boundary change by identifying the invasion or retreat of a certain class.
arXiv Detail & Related papers (2024-05-23T04:03:36Z)
Fault Detection and Monitoring using an Information-Driven Strategy: Method, Theory, and Application [5.056456697289351]
We propose an information-driven fault detection method based on a novel concept drift detector. The method is tailored to identifying drifts in input-output relationships of additive noise models. We prove several theoretical properties of the proposed MI-based fault detection scheme.
arXiv Detail & Related papers (2024-05-06T17:43:39Z)
Towards stable real-world equation discovery with assessing differentiating quality influence [52.2980614912553]
We propose alternatives to the commonly used finite differences-based method. We evaluate these methods in terms of applicability to problems, similar to the real ones, and their ability to ensure the convergence of equation discovery algorithms.
arXiv Detail & Related papers (2023-11-09T23:32:06Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Detecting Concept Drift With Neural Network Model Uncertainty [0.0]
Uncertainty Drift Detection (UDD) is able to detect drifts without access to true labels. In contrast to input data-based drift detection, our approach considers the effects of the current input data on the properties of the prediction model. We show that UDD outperforms other state-of-the-art strategies on two synthetic as well as ten real-world data sets for both regression and classification tasks.
arXiv Detail & Related papers (2021-07-05T08:56:36Z)
Automatic Learning to Detect Concept Drift [40.69280758487987]
We propose Meta-ADD, a novel framework that learns to classify concept drift by tracking the changed pattern of error rates. Specifically, in the training phase, we extract meta-features based on the error rates of various concept drift, after which a meta-detector is developed via prototypical neural network. In the detection phase, the learned meta-detector is fine-tuned to adapt to the corresponding data stream via stream-based active learning.
arXiv Detail & Related papers (2021-05-04T11:10:39Z)
Meta-learning One-class Classifiers with Eigenvalue Solvers for Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection. We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z)
Concept Drift Detection: Dealing with MissingValues via Fuzzy Distance Estimations [40.77597229122878]
In data streams, the data distribution of arriving observations at different time points may change - a phenomenon called concept drift. We show that missing values exert a profound impact on concept drift detection, but using fuzzy set theory to model observations can produce more reliable results than imputation.
arXiv Detail & Related papers (2020-08-09T05:25:46Z)
Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass. We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.