Related papers: Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time

Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time

URL: http://arxiv.org/abs/2406.17813v1
Date: Mon, 24 Jun 2024 23:41:46 GMT
Title: Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time
Authors: Salvatore Greco, Bartolomeo Vacchetti, Daniele Apiletti, Tania Cerquitelli,
Abstract summary: Concept Drift is a phenomenon in which the underlying data distribution and statistical properties of a target domain change over time. We propose DriftLens, an unsupervised real-time concept drift detection framework. It works on unstructured data by exploiting the distribution distances of deep learning representations.
Score: 5.999777817331315
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Concept Drift is a phenomenon in which the underlying data distribution and statistical properties of a target domain change over time, leading to a degradation of the model's performance. Consequently, models deployed in production require continuous monitoring through drift detection techniques. Most drift detection methods to date are supervised, i.e., based on ground-truth labels. However, true labels are usually not available in many real-world scenarios. Although recent efforts have been made to develop unsupervised methods, they often lack the required accuracy, have a complexity that makes real-time implementation in production environments difficult, or are unable to effectively characterize drift. To address these challenges, we propose DriftLens, an unsupervised real-time concept drift detection framework. It works on unstructured data by exploiting the distribution distances of deep learning representations. DriftLens can also provide drift characterization by analyzing each label separately. A comprehensive experimental evaluation is presented with multiple deep learning classifiers for text, image, and speech. Results show that (i) DriftLens performs better than previous methods in detecting drift in $11/13$ use cases; (ii) it runs at least 5 times faster; (iii) its detected drift value is very coherent with the amount of drift (correlation $\geq 0.85$); (iv) it is robust to parameter changes.

Related papers

Flexible and Efficient Drift Detection without Labels [4.517793609535323]
A lot of research on concept drift has focused on the supervised case that assumes the true labels of supervised tasks are available immediately after making predictions.<n>We propose a flexible and efficient concept drift detection algorithm that uses classical statistical process control in a label-less setting.
arXiv Detail & Related papers (2025-06-10T12:31:04Z)
Adversarial Attacks for Drift Detection [6.234802839923542]
This work studies the shortcomings of commonly used drift detection schemes. We show how to construct data streams that are drifting without being detected. In particular, we compute all possible adversairals for common detection schemes.
arXiv Detail & Related papers (2024-11-25T17:25:00Z)
A Synthetic Benchmark to Explore Limitations of Localized Drift Detections [13.916984628784766]
Concept drift is a common phenomenon in data streams where the statistical properties of the target variable change over time. This paper explores the concept of localized drift and evaluates the performance of several drift detection techniques in identifying such localized changes.
arXiv Detail & Related papers (2024-08-26T23:24:31Z)
SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract. We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z)
Methods for Generating Drift in Text Streams [49.3179290313959]
Concept drift is a frequent phenomenon in real-world datasets and corresponds to changes in data distribution over time. This paper provides four textual drift generation methods to ease the production of datasets with labeled drifts. Results show that all methods have their performance degraded right after the drifts, and the incremental SVM is the fastest to run and recover the previous performance levels.
arXiv Detail & Related papers (2024-03-18T23:48:33Z)
Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments. Our approach enhances LiDAR-based detection models using spatial quantized historical features. Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z)
Unsupervised Adaptation from Repeated Traversals for Autonomous Driving [54.59577283226982]
Self-driving cars must generalize to the end-user's environment to operate reliably. One potential solution is to leverage unlabeled data collected from the end-users' environments. There is no reliable signal in the target domain to supervise the adaptation process. We show that this simple additional assumption is sufficient to obtain a potent signal that allows us to perform iterative self-training of 3D object detectors on the target domain.
arXiv Detail & Related papers (2023-03-27T15:07:55Z)
CADM: Confusion Model-based Detection Method for Real-drift in Chunk Data Stream [3.0885191226198785]
Concept drift detection has attracted considerable attention due to its importance in many real-world applications such as health monitoring and fault diagnosis. We propose a new approach to detect real-drift in the chunk data stream with limited annotations based on concept confusion.
arXiv Detail & Related papers (2023-03-25T08:59:27Z)
Consistent Diffusion Models: Mitigating Sampling Drift by Learning to be Consistent [97.64313409741614]
We propose to enforce a emphconsistency property which states that predictions of the model on its own generated data are consistent across time. We show that our novel training objective yields state-of-the-art results for conditional and unconditional generation in CIFAR-10 and baseline improvements in AFHQ and FFHQ.
arXiv Detail & Related papers (2023-02-17T18:45:04Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Tracking the risk of a deployed model and detecting harmful distribution shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z)
Task-Sensitive Concept Drift Detector with Metric Learning [7.706795195017394]
We propose a novel task-sensitive drift detection framework, which is able to detect drifts without access to true labels during inference. It is able to detect real drift, where the drift affects the classification performance, while it properly ignores virtual drift. We evaluate the performance of the proposed framework with a novel metric, which accumulates the standard metrics of detection accuracy, false positive rate and detection delay into one value.
arXiv Detail & Related papers (2021-08-16T09:10:52Z)
Bayesian Autoencoders for Drift Detection in Industrial Environments [69.93875748095574]
Autoencoders are unsupervised models which have been used for detecting anomalies in multi-sensor environments. Anomalies can come either from real changes in the environment (real drift) or from faulty sensory devices (virtual drift)
arXiv Detail & Related papers (2021-07-28T10:19:58Z)
Detecting Concept Drift With Neural Network Model Uncertainty [0.0]
Uncertainty Drift Detection (UDD) is able to detect drifts without access to true labels. In contrast to input data-based drift detection, our approach considers the effects of the current input data on the properties of the prediction model. We show that UDD outperforms other state-of-the-art strategies on two synthetic as well as ten real-world data sets for both regression and classification tasks.
arXiv Detail & Related papers (2021-07-05T08:56:36Z)
DriftSurf: A Risk-competitive Learning Algorithm under Concept Drift [12.579800289829963]
When learning from streaming data, a change in the data distribution, also known as concept drift, can render a previously-learned model inaccurate. We present an adaptive learning algorithm that extends previous drift-detection-based methods by incorporating drift detection into a broader stable-state/reactive-state process. The algorithm is generic in its base learner and can be applied across a variety of supervised learning problems.
arXiv Detail & Related papers (2020-03-13T23:25:25Z)
Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass. We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.