Concept Drift Detection via Equal Intensity k-means Space Partitioning
- URL: http://arxiv.org/abs/2004.11587v1
- Date: Fri, 24 Apr 2020 08:00:16 GMT
- Title: Concept Drift Detection via Equal Intensity k-means Space Partitioning
- Authors: Anjin Liu, Jie Lu, Guangquan Zhang
- Abstract summary: Cluster-based histogram called equal intensity k-means space partitioning (EI-kMeans)
Three algorithms are developed to implement concept drift detection, including a greedy centroids algorithm, a cluster amplify-shrink algorithm, and a drift detection algorithm.
Experiments on synthetic and real-world datasets demonstrate the advantages of EI-kMeans and show its efficacy in detecting concept drift.
- Score: 40.77597229122878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data stream poses additional challenges to statistical classification tasks
because distributions of the training and target samples may differ as time
passes. Such distribution change in streaming data is called concept drift.
Numerous histogram-based distribution change detection methods have been
proposed to detect drift. Most histograms are developed on grid-based or
tree-based space partitioning algorithms which makes the space partitions
arbitrary, unexplainable, and may cause drift blind-spots. There is a need to
improve the drift detection accuracy for histogram-based methods with the
unsupervised setting. To address this problem, we propose a cluster-based
histogram, called equal intensity k-means space partitioning (EI-kMeans). In
addition, a heuristic method to improve the sensitivity of drift detection is
introduced. The fundamental idea of improving the sensitivity is to minimize
the risk of creating partitions in distribution offset regions. Pearson's
chi-square test is used as the statistical hypothesis test so that the test
statistics remain independent of the sample distribution. The number of bins
and their shapes, which strongly influence the ability to detect drift, are
determined dynamically from the sample based on an asymptotic constraint in the
chi-square test. Accordingly, three algorithms are developed to implement
concept drift detection, including a greedy centroids initialization algorithm,
a cluster amplify-shrink algorithm, and a drift detection algorithm. For drift
adaptation, we recommend retraining the learner if a drift is detected. The
results of experiments on synthetic and real-world datasets demonstrate the
advantages of EI-kMeans and show its efficacy in detecting concept drift.
Related papers
- A Neighbor-Searching Discrepancy-based Drift Detection Scheme for Learning Evolving Data [40.00357483768265]
This work presents a novel real concept drift detection method based on Neighbor-Searching Discrepancy.
The proposed method is able to detect real concept drift with high accuracy while ignoring virtual drift.
It can also indicate the direction of the classification boundary change by identifying the invasion or retreat of a certain class.
arXiv Detail & Related papers (2024-05-23T04:03:36Z) - CADM: Confusion Model-based Detection Method for Real-drift in Chunk
Data Stream [3.0885191226198785]
Concept drift detection has attracted considerable attention due to its importance in many real-world applications such as health monitoring and fault diagnosis.
We propose a new approach to detect real-drift in the chunk data stream with limited annotations based on concept confusion.
arXiv Detail & Related papers (2023-03-25T08:59:27Z) - Task-Sensitive Concept Drift Detector with Metric Learning [7.706795195017394]
We propose a novel task-sensitive drift detection framework, which is able to detect drifts without access to true labels during inference.
It is able to detect real drift, where the drift affects the classification performance, while it properly ignores virtual drift.
We evaluate the performance of the proposed framework with a novel metric, which accumulates the standard metrics of detection accuracy, false positive rate and detection delay into one value.
arXiv Detail & Related papers (2021-08-16T09:10:52Z) - Detecting Concept Drift With Neural Network Model Uncertainty [0.0]
Uncertainty Drift Detection (UDD) is able to detect drifts without access to true labels.
In contrast to input data-based drift detection, our approach considers the effects of the current input data on the properties of the prediction model.
We show that UDD outperforms other state-of-the-art strategies on two synthetic as well as ten real-world data sets for both regression and classification tasks.
arXiv Detail & Related papers (2021-07-05T08:56:36Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - Dense Label Encoding for Boundary Discontinuity Free Rotation Detection [69.75559390700887]
This paper explores a relatively less-studied methodology based on classification.
We propose new techniques to push its frontier in two aspects.
Experiments and visual analysis on large-scale public datasets for aerial images show the effectiveness of our approach.
arXiv Detail & Related papers (2020-11-19T05:42:02Z) - Concept Drift Detection: Dealing with MissingValues via Fuzzy Distance
Estimations [40.77597229122878]
In data streams, the data distribution of arriving observations at different time points may change - a phenomenon called concept drift.
We show that missing values exert a profound impact on concept drift detection, but using fuzzy set theory to model observations can produce more reliable results than imputation.
arXiv Detail & Related papers (2020-08-09T05:25:46Z) - UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional
Variational Autoencoders [81.5490760424213]
We propose the first framework (UCNet) to employ uncertainty for RGB-D saliency detection by learning from the data labeling process.
Inspired by the saliency data labeling process, we propose probabilistic RGB-D saliency detection network.
arXiv Detail & Related papers (2020-04-13T04:12:59Z) - Spatially Adaptive Inference with Stochastic Feature Sampling and
Interpolation [72.40827239394565]
We propose to compute features only at sparsely sampled locations.
We then densely reconstruct the feature map with an efficient procedure.
The presented network is experimentally shown to save substantial computation while maintaining accuracy over a variety of computer vision tasks.
arXiv Detail & Related papers (2020-03-19T15:36:31Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.