A Generic Machine Learning Framework for Fully-Unsupervised Anomaly
Detection with Contaminated Data
- URL: http://arxiv.org/abs/2308.13352v3
- Date: Wed, 31 Jan 2024 14:53:18 GMT
- Title: A Generic Machine Learning Framework for Fully-Unsupervised Anomaly
Detection with Contaminated Data
- Authors: Markus Ulmer, Jannik Zgraggen, and Lilach Goren Huber
- Abstract summary: We introduce a framework for a fully unsupervised refinement of contaminated training data for AD tasks.
The framework is generic and can be applied to any residual-based machine learning model.
We show its clear superiority over the naive approach of training with contaminated data without refinement.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anomaly detection (AD) tasks have been solved using machine learning
algorithms in various domains and applications. The great majority of these
algorithms use normal data to train a residual-based model and assign anomaly
scores to unseen samples based on their dissimilarity with the learned normal
regime. The underlying assumption of these approaches is that anomaly-free data
is available for training. This is, however, often not the case in real-world
operational settings, where the training data may be contaminated with an
unknown fraction of abnormal samples. Training with contaminated data, in turn,
inevitably leads to a deteriorated AD performance of the residual-based
algorithms.
In this paper we introduce a framework for a fully unsupervised refinement of
contaminated training data for AD tasks. The framework is generic and can be
applied to any residual-based machine learning model. We demonstrate the
application of the framework to two public datasets of multivariate time series
machine data from different application fields. We show its clear superiority
over the naive approach of training with contaminated data without refinement.
Moreover, we compare it to the ideal, unrealistic reference in which
anomaly-free data would be available for training. The method is based on
evaluating the contribution of individual samples to the generalization ability
of a given model, and contrasting the contribution of anomalies with the one of
normal samples. As a result, the proposed approach is comparable to, and often
outperforms training with normal samples only.
Related papers
- Adaptive Deviation Learning for Visual Anomaly Detection with Data Contamination [20.4008901760593]
We introduce a systematic adaptive method that employs deviation learning to compute anomaly scores end-to-end.
Our proposed method surpasses competing techniques and exhibits both stability and robustness in the presence of data contamination.
arXiv Detail & Related papers (2024-11-14T16:10:15Z) - Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection [88.34095233600719]
FAPrompt is a novel framework designed to learn Fine-grained Abnormality Prompts for more accurate ZSAD.
It substantially outperforms state-of-the-art methods by at least 3%-5% AUC/AP in both image- and pixel-level ZSAD tasks.
arXiv Detail & Related papers (2024-10-14T08:41:31Z) - Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection [9.784793380119806]
We introduce DIAG, a training-free Diffusion-based In-distribution Anomaly Generation pipeline for data augmentation.
Unlike conventional image generation techniques, we implement a human-in-the-loop pipeline, where domain experts provide multimodal guidance to the model.
We demonstrate the efficacy and versatility of DIAG with respect to state-of-the-art data augmentation approaches on the challenging KSDD2 dataset.
arXiv Detail & Related papers (2024-07-04T14:28:52Z) - Anomaly Detection by Context Contrasting [57.695202846009714]
Anomaly detection focuses on identifying samples that deviate from the norm.
Recent advances in self-supervised learning have shown great promise in this regard.
We propose Con$$, which learns through context augmentations.
arXiv Detail & Related papers (2024-05-29T07:59:06Z) - Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts [25.629973843455495]
Generalist Anomaly Detection (GAD) aims to train one single detection model that can generalize to detect anomalies in diverse datasets from different application domains without further training on the target data.
We introduce a novel approach that learns an in-context residual learning model for GAD, termed InCTRL.
InCTRL is the best performer and significantly outperforms state-of-the-art competing methods.
arXiv Detail & Related papers (2024-03-11T08:07:46Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Augment to Detect Anomalies with Continuous Labelling [10.646747658653785]
Anomaly detection is to recognize samples that differ in some respect from the training observations.
Recent state-of-the-art deep learning-based anomaly detection methods suffer from high computational cost, complexity, unstable training procedures, and non-trivial implementation.
We leverage a simple learning procedure that trains a lightweight convolutional neural network, reaching state-of-the-art performance in anomaly detection.
arXiv Detail & Related papers (2022-07-03T20:11:51Z) - Explainable Deep Few-shot Anomaly Detection with Deviation Networks [123.46611927225963]
We introduce a novel weakly-supervised anomaly detection framework to train detection models.
The proposed approach learns discriminative normality by leveraging the labeled anomalies and a prior probability.
Our model is substantially more sample-efficient and robust, and performs significantly better than state-of-the-art competing methods in both closed-set and open-set settings.
arXiv Detail & Related papers (2021-08-01T14:33:17Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Deep Visual Anomaly detection with Negative Learning [18.79849041106952]
In this paper, we propose anomaly detection with negative learning (ADNL), which employs the negative learning concept for the enhancement of anomaly detection.
The idea is to limit the reconstruction capability of a generative model using the given a small amount of anomaly examples.
This way, the network not only learns to reconstruct normal data but also encloses the normal distribution far from the possible distribution of anomalies.
arXiv Detail & Related papers (2021-05-24T01:48:44Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.