Unsupervised Anomaly Detection for Tabular Data Using Noise Evaluation
- URL: http://arxiv.org/abs/2412.11461v1
- Date: Mon, 16 Dec 2024 05:35:58 GMT
- Title: Unsupervised Anomaly Detection for Tabular Data Using Noise Evaluation
- Authors: Wei Dai, Kai Hwang, Jicong Fan,
- Abstract summary: Unsupervised anomaly detection (UAD) plays an important role in modern data analytics.
We present a novel UAD method by evaluating how much noise is in the data.
We provide theoretical guarantees, proving that the proposed method can detect anomalous data successfully.
- Score: 26.312206159418903
- License:
- Abstract: Unsupervised anomaly detection (UAD) plays an important role in modern data analytics and it is crucial to provide simple yet effective and guaranteed UAD algorithms for real applications. In this paper, we present a novel UAD method for tabular data by evaluating how much noise is in the data. Specifically, we propose to learn a deep neural network from the clean (normal) training dataset and a noisy dataset, where the latter is generated by adding highly diverse noises to the clean data. The neural network can learn a reliable decision boundary between normal data and anomalous data when the diversity of the generated noisy data is sufficiently high so that the hard abnormal samples lie in the noisy region. Importantly, we provide theoretical guarantees, proving that the proposed method can detect anomalous data successfully, although the method does not utilize any real anomalous data in the training stage. Extensive experiments through more than 60 benchmark datasets demonstrate the effectiveness of the proposed method in comparison to 12 baselines of UAD. Our method obtains a 92.27\% AUC score and a 1.68 ranking score on average. Moreover, compared to the state-of-the-art UAD methods, our method is easier to implement.
Related papers
- On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations [10.931620604044486]
This study provides an in-depth analysis of the impact of diverse data resampling methods on existingAD approaches.
We assess the performance of theseAD approaches across four datasets with different levels of class imbalance.
We evaluate the effectiveness of the data resampling methods when utilizing optimal resampling ratios of normal to abnormal data.
arXiv Detail & Related papers (2024-05-06T14:01:05Z) - SoftPatch: Unsupervised Anomaly Detection with Noisy Data [67.38948127630644]
This paper considers label-level noise in image sensory anomaly detection for the first time.
We propose a memory-based unsupervised AD method, SoftPatch, which efficiently denoises the data at the patch level.
Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset.
arXiv Detail & Related papers (2024-03-21T08:49:34Z) - Fast kernel methods for Data Quality Monitoring as a goodness-of-fit
test [10.882743697472755]
We propose a machine learning approach for monitoring particle detectors in real-time.
The goal is to assess the compatibility of incoming experimental data with a reference dataset, characterising the data behaviour under normal circumstances.
The model is based on a modern implementation of kernel methods, nonparametric algorithms that can learn any continuous function given enough data.
arXiv Detail & Related papers (2023-03-09T16:59:35Z) - On-the-fly Denoising for Data Augmentation in Natural Language
Understanding [101.46848743193358]
We propose an on-the-fly denoising technique for data augmentation that learns from soft augmented labels provided by an organic teacher model trained on the cleaner original data.
Our method can be applied to general augmentation techniques and consistently improve the performance on both text classification and question-answering tasks.
arXiv Detail & Related papers (2022-12-20T18:58:33Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Robust Learning of Deep Time Series Anomaly Detection Models with
Contaminated Training Data [29.808942473293108]
Time series anomaly detection (TSAD) is an important data mining task with numerous applications in the IoT era.
Deep TSAD methods typically rely on a clean training dataset that is not polluted by anomalies to learn the "normal profile" of the underlying dynamics.
We propose a model-agnostic method which can effectively improve the robustness of learning mainstream deep TSAD models with potentially contaminated data.
arXiv Detail & Related papers (2022-08-03T04:52:08Z) - An Efficient Anomaly Detection Approach using Cube Sampling with
Streaming Data [2.0515785954568626]
Anomaly detection is critical in various fields, including intrusion detection, health monitoring, fault diagnosis, and sensor network event detection.
The isolation forest (or iForest) approach is a well-known technique for detecting anomalies.
We propose an efficient iForest based approach for anomaly detection using cube sampling that is effective on streaming data.
arXiv Detail & Related papers (2021-10-05T04:23:00Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z) - Radioactive data: tracing through training [130.2266320167683]
We propose a new technique, emphradioactive data, that makes imperceptible changes to this dataset such that any model trained on it will bear an identifiable mark.
Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value)
Our method is robust to data augmentation and backdoority of deep network optimization.
arXiv Detail & Related papers (2020-02-03T18:41:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.