Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time
- URL: http://arxiv.org/abs/2406.17813v2
- Date: Thu, 24 Jul 2025 15:10:45 GMT
- Title: Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time
- Authors: Salvatore Greco, Bartolomeo Vacchetti, Daniele Apiletti, Tania Cerquitelli,
- Abstract summary: Concept drift is the phenomenon in which the underlying data distributions and statistical properties of a target domain change over time.<n>textscDriftLens is an unsupervised framework for real-time concept drift detection and characterization.
- Score: 5.999777817331315
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Concept drift is the phenomenon in which the underlying data distributions and statistical properties of a target domain change over time, leading to a degradation in model performance. Consequently, production models require continuous drift detection monitoring. Most drift detection methods to date are supervised, relying on ground-truth labels. However, they are inapplicable in many real-world scenarios, as true labels are often unavailable. Although recent efforts have proposed unsupervised drift detectors, many lack the accuracy required for reliable detection or are too computationally intensive for real-time use in high-dimensional, large-scale production environments. Moreover, they often fail to characterize or explain drift effectively. To address these limitations, we propose \textsc{DriftLens}, an unsupervised framework for real-time concept drift detection and characterization. Designed for deep learning classifiers handling unstructured data, \textsc{DriftLens} leverages distribution distances in deep learning representations to enable efficient and accurate detection. Additionally, it characterizes drift by analyzing and explaining its impact on each label. Our evaluation across classifiers and data-types demonstrates that \textsc{DriftLens} (i) outperforms previous methods in detecting drift in 15/17 use cases; (ii) runs at least 5 times faster; (iii) produces drift curves that align closely with actual drift (correlation $\geq\!0.85$); (iv) effectively identifies representative drift samples as explanations.
Related papers
- Flexible and Efficient Drift Detection without Labels [4.517793609535323]
A lot of research on concept drift has focused on the supervised case that assumes the true labels of supervised tasks are available immediately after making predictions.<n>We propose a flexible and efficient concept drift detection algorithm that uses classical statistical process control in a label-less setting.
arXiv Detail & Related papers (2025-06-10T12:31:04Z) - Adversarial Attacks for Drift Detection [6.234802839923542]
This work studies the shortcomings of commonly used drift detection schemes.
We show how to construct data streams that are drifting without being detected.
In particular, we compute all possible adversairals for common detection schemes.
arXiv Detail & Related papers (2024-11-25T17:25:00Z) - A Synthetic Benchmark to Explore Limitations of Localized Drift Detections [13.916984628784766]
Concept drift is a common phenomenon in data streams where the statistical properties of the target variable change over time.
This paper explores the concept of localized drift and evaluates the performance of several drift detection techniques in identifying such localized changes.
arXiv Detail & Related papers (2024-08-26T23:24:31Z) - SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - Methods for Generating Drift in Text Streams [49.3179290313959]
Concept drift is a frequent phenomenon in real-world datasets and corresponds to changes in data distribution over time.
This paper provides four textual drift generation methods to ease the production of datasets with labeled drifts.
Results show that all methods have their performance degraded right after the drifts, and the incremental SVM is the fastest to run and recover the previous performance levels.
arXiv Detail & Related papers (2024-03-18T23:48:33Z) - Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - Unsupervised Adaptation from Repeated Traversals for Autonomous Driving [54.59577283226982]
Self-driving cars must generalize to the end-user's environment to operate reliably.
One potential solution is to leverage unlabeled data collected from the end-users' environments.
There is no reliable signal in the target domain to supervise the adaptation process.
We show that this simple additional assumption is sufficient to obtain a potent signal that allows us to perform iterative self-training of 3D object detectors on the target domain.
arXiv Detail & Related papers (2023-03-27T15:07:55Z) - CADM: Confusion Model-based Detection Method for Real-drift in Chunk
Data Stream [3.0885191226198785]
Concept drift detection has attracted considerable attention due to its importance in many real-world applications such as health monitoring and fault diagnosis.
We propose a new approach to detect real-drift in the chunk data stream with limited annotations based on concept confusion.
arXiv Detail & Related papers (2023-03-25T08:59:27Z) - Consistent Diffusion Models: Mitigating Sampling Drift by Learning to be
Consistent [97.64313409741614]
We propose to enforce a emphconsistency property which states that predictions of the model on its own generated data are consistent across time.
We show that our novel training objective yields state-of-the-art results for conditional and unconditional generation in CIFAR-10 and baseline improvements in AFHQ and FFHQ.
arXiv Detail & Related papers (2023-02-17T18:45:04Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Tracking the risk of a deployed model and detecting harmful distribution
shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially.
We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z) - Task-Sensitive Concept Drift Detector with Metric Learning [7.706795195017394]
We propose a novel task-sensitive drift detection framework, which is able to detect drifts without access to true labels during inference.
It is able to detect real drift, where the drift affects the classification performance, while it properly ignores virtual drift.
We evaluate the performance of the proposed framework with a novel metric, which accumulates the standard metrics of detection accuracy, false positive rate and detection delay into one value.
arXiv Detail & Related papers (2021-08-16T09:10:52Z) - Bayesian Autoencoders for Drift Detection in Industrial Environments [69.93875748095574]
Autoencoders are unsupervised models which have been used for detecting anomalies in multi-sensor environments.
Anomalies can come either from real changes in the environment (real drift) or from faulty sensory devices (virtual drift)
arXiv Detail & Related papers (2021-07-28T10:19:58Z) - Detecting Concept Drift With Neural Network Model Uncertainty [0.0]
Uncertainty Drift Detection (UDD) is able to detect drifts without access to true labels.
In contrast to input data-based drift detection, our approach considers the effects of the current input data on the properties of the prediction model.
We show that UDD outperforms other state-of-the-art strategies on two synthetic as well as ten real-world data sets for both regression and classification tasks.
arXiv Detail & Related papers (2021-07-05T08:56:36Z) - DriftSurf: A Risk-competitive Learning Algorithm under Concept Drift [12.579800289829963]
When learning from streaming data, a change in the data distribution, also known as concept drift, can render a previously-learned model inaccurate.
We present an adaptive learning algorithm that extends previous drift-detection-based methods by incorporating drift detection into a broader stable-state/reactive-state process.
The algorithm is generic in its base learner and can be applied across a variety of supervised learning problems.
arXiv Detail & Related papers (2020-03-13T23:25:25Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.