AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly
Detection
- URL: http://arxiv.org/abs/2206.15476v4
- Date: Mon, 3 Apr 2023 16:00:22 GMT
- Title: AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly
Detection
- Authors: Marius Dragoi, Elena Burceanu, Emanuela Haller, Andrei Manolache and
Florin Brad
- Abstract summary: We introduce an unsupervised anomaly detection benchmark with data that shifts over time, built over Kyoto-2006+, a traffic dataset for network intrusion detection.
We first highlight the non-stationary nature of the data, using a basic per-feature analysis, t-SNE, and an Optimal Transport approach for measuring the overall distribution distances between years.
We validate the performance degradation over time with diverse models, ranging from classical approaches to deep learning.
- Score: 7.829710051617368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analyzing the distribution shift of data is a growing research direction in
nowadays Machine Learning (ML), leading to emerging new benchmarks that focus
on providing a suitable scenario for studying the generalization properties of
ML models. The existing benchmarks are focused on supervised learning, and to
the best of our knowledge, there is none for unsupervised learning. Therefore,
we introduce an unsupervised anomaly detection benchmark with data that shifts
over time, built over Kyoto-2006+, a traffic dataset for network intrusion
detection. This type of data meets the premise of shifting the input
distribution: it covers a large time span ($10$ years), with naturally
occurring changes over time (eg users modifying their behavior patterns, and
software updates). We first highlight the non-stationary nature of the data,
using a basic per-feature analysis, t-SNE, and an Optimal Transport approach
for measuring the overall distribution distances between years. Next, we
propose AnoShift, a protocol splitting the data in IID, NEAR, and FAR testing
splits. We validate the performance degradation over time with diverse models,
ranging from classical approaches to deep learning. Finally, we show that by
acknowledging the distribution shift problem and properly addressing it, the
performance can be improved compared to the classical training which assumes
independent and identically distributed data (on average, by up to $3\%$ for
our approach). Dataset and code are available at
https://github.com/bit-ml/AnoShift/.
Related papers
- Understanding the Generalizability of Link Predictors Under Distribution Shifts on Graphs [34.58496513149175]
Many popular benchmark datasets assume that dataset samples are drawn from the same distribution.
We introduce LP-specific data splits which utilize structural properties to induce a controlled distribution shift.
We verify the shift's effect empirically through evaluation of different SOTA LP methods and subsequently couple these methods with generalization techniques.
arXiv Detail & Related papers (2024-06-13T03:47:12Z) - PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a.
Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns.
We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z) - LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised
Time Series Anomaly Detection [49.52429991848581]
We propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs)
This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; and 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones.
arXiv Detail & Related papers (2023-10-09T12:36:16Z) - Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time [69.77704012415845]
Temporal shifts can considerably degrade performance of machine learning models deployed in the real world.
We benchmark 13 prior approaches, including methods in domain generalization, continual learning, self-supervised learning, and ensemble learning.
Under both evaluation strategies, we observe an average performance drop of 20% from in-distribution to out-of-distribution data.
arXiv Detail & Related papers (2022-11-25T17:07:53Z) - Anomaly Detection with Test Time Augmentation and Consistency Evaluation [13.709281244889691]
We propose a simple, yet effective anomaly detection algorithm named Test Time Augmentation Anomaly Detection (TTA-AD)
We observe that in-distribution data enjoy more consistent predictions for its original and augmented versions on a trained network than out-distribution data.
Experiments on various high-resolution image benchmark datasets demonstrate that TTA-AD achieves comparable or better detection performance.
arXiv Detail & Related papers (2022-06-06T04:27:06Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Deep Generative model with Hierarchical Latent Factors for Time Series
Anomaly Detection [40.21502451136054]
This work presents DGHL, a new family of generative models for time series anomaly detection.
A top-down Convolution Network maps a novel hierarchical latent space to time series windows, exploiting temporal dynamics to encode information efficiently.
Our method outperformed current state-of-the-art models on four popular benchmark datasets.
arXiv Detail & Related papers (2022-02-15T17:19:44Z) - DATE: Detecting Anomalies in Text via Self-Supervision of Transformers [5.105840060102528]
Recent deep methods for anomalies in images learn better features of normality in an end-to-end self-supervised setting.
We use this approach for Anomaly Detection in text, by introducing a novel pretext task on text sequences.
We show strong quantitative and qualitative results on the 20Newsgroups and AG News datasets.
arXiv Detail & Related papers (2021-04-12T16:08:05Z) - BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift.
We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions.
Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.