DQS: A Low-Budget Query Strategy for Enhancing Unsupervised Data-driven Anomaly Detection Approaches
- URL: http://arxiv.org/abs/2509.05663v3
- Date: Sun, 26 Oct 2025 17:21:20 GMT
- Title: DQS: A Low-Budget Query Strategy for Enhancing Unsupervised Data-driven Anomaly Detection Approaches
- Authors: Lucas Correia, Jan-Christoph Goos, Thomas Bäck, Anna V. Kononova,
- Abstract summary: This work integrates active learning with an existing unsupervised anomaly detection method.<n>We introduce a novel query strategy called the dissimilarity-based query strategy (DQS)
- Score: 3.3482093430607267
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Truly unsupervised approaches for time series anomaly detection are rare in the literature. Those that exist suffer from a poorly set threshold, which hampers detection performance, while others, despite claiming to be unsupervised, need to be calibrated using a labelled data subset, which is often not available in the real world. This work integrates active learning with an existing unsupervised anomaly detection method by selectively querying the labels of multivariate time series, which are then used to refine the threshold selection process. To achieve this, we introduce a novel query strategy called the dissimilarity-based query strategy (DQS). DQS aims to maximise the diversity of queried samples by evaluating the similarity between anomaly scores using dynamic time warping. We assess the detection performance of DQS in comparison to other query strategies and explore the impact of mislabelling, a topic that is underexplored in the literature. Our findings indicate that DQS performs best in small-budget scenarios, though the others appear to be more robust when faced with mislabelling. Therefore, in the real world, the choice of query strategy depends on the expertise of the oracle and the number of samples they are willing to label. Regardless, all query strategies outperform the unsupervised threshold even in the presence of mislabelling. Thus, whenever it is feasible to query an oracle, employing an active learning-based threshold is recommended.
Related papers
- Noise-Resilient Point-wise Anomaly Detection in Time Series Using Weak Segment Labels [27.250664021725317]
NRdetector is a noise-resilient framework that incorporates confidence-based sample selection, robust segment-level learning, and data-centric point-level detection.<n>It consistently achieves robust results across multiple real-world datasets.
arXiv Detail & Related papers (2025-01-21T08:10:02Z) - Realistic Evaluation of Test-Time Adaptation Algorithms: Unsupervised Hyperparameter Selection [1.4530711901349282]
Test-Time Adaptation (TTA) has emerged as a promising strategy for tackling the problem of machine learning model robustness under distribution shifts.
We evaluate existing TTA methods using surrogate-based hp-selection strategies to obtain a more realistic evaluation of their performance.
arXiv Detail & Related papers (2024-07-19T11:58:30Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model [14.98695074168234]
We propose a new method to detect machine-generated text, especially from large language models (LLMs)
We use a Bayesian surrogate model, which allows us to select typical samples based on Bayesian uncertainty and interpolate scores from typical samples to other samples, to improve query efficiency.
Empirical results demonstrate that our method significantly outperforms existing approaches under a low query budget.
arXiv Detail & Related papers (2023-05-26T04:23:10Z) - PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows.
Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z) - How to Allocate your Label Budget? Choosing between Active Learning and
Learning to Reject in Anomaly Detection [15.224212372777002]
Anomaly detection attempts at finding examples that deviate from the expected behaviour.
The lack of labels makes the anomaly detector have high uncertainty in some regions.
We propose a mixed strategy that decides in multiple rounds whether to collect AL labels or Learning to Reject labels.
arXiv Detail & Related papers (2023-01-07T18:02:43Z) - Label-Efficient Interactive Time-Series Anomaly Detection [17.799924009674694]
We propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system.
To achieve this goal, the system integrates weak supervision and active learning collaboratively.
We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions.
arXiv Detail & Related papers (2022-12-30T10:16:15Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - Unsupervised Model Selection for Time-series Anomaly Detection [7.8027110514393785]
We identify three classes of surrogate (unsupervised) metrics, namely, prediction error, model centrality, and performance on injected synthetic anomalies.
We formulate metric combination with multiple imperfect surrogate metrics as a robust rank aggregation problem.
Large-scale experiments on multiple real-world datasets demonstrate that our proposed unsupervised approach is as effective as selecting the most accurate model.
arXiv Detail & Related papers (2022-10-03T16:49:30Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z) - Meta-AAD: Active Anomaly Detection with Deep Reinforcement Learning [56.65934079419417]
High false-positive rate is a long-standing challenge for anomaly detection algorithms.
We propose Active Anomaly Detection with Meta-Policy (Meta-AAD), a novel framework that learns a meta-policy for query selection.
arXiv Detail & Related papers (2020-09-16T01:47:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.