Unsupervised Learning of Distributional Properties can Supplement Human
Labeling and Increase Active Learning Efficiency in Anomaly Detection
- URL: http://arxiv.org/abs/2307.08782v1
- Date: Thu, 13 Jul 2023 22:14:30 GMT
- Title: Unsupervised Learning of Distributional Properties can Supplement Human
Labeling and Increase Active Learning Efficiency in Anomaly Detection
- Authors: Jaturong Kongmanee, Mark Chignell, Khilan Jerath, Abhay Raman
- Abstract summary: Exfiltration of data via email is a serious cybersecurity threat for many organizations.
Active Learning is a promising approach for labeling data efficiently.
We propose an adaptive AL sampling strategy to produce batches of cases to be labeled that contain instances of rare anomalies.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exfiltration of data via email is a serious cybersecurity threat for many
organizations. Detecting data exfiltration (anomaly) patterns typically
requires labeling, most often done by a human annotator, to reduce the high
number of false alarms. Active Learning (AL) is a promising approach for
labeling data efficiently, but it needs to choose an efficient order in which
cases are to be labeled, and there are uncertainties as to what scoring
procedure should be used to prioritize cases for labeling, especially when
detecting rare cases of interest is crucial. We propose an adaptive AL sampling
strategy that leverages the underlying prior data distribution, as well as
model uncertainty, to produce batches of cases to be labeled that contain
instances of rare anomalies. We show that (1) the classifier benefits from a
batch of representative and informative instances of both normal and anomalous
examples, (2) unsupervised anomaly detection plays a useful role in building
the classifier in the early stages of training when relatively little labeling
has been done thus far. Our approach to AL for anomaly detection outperformed
existing AL approaches on three highly unbalanced UCI benchmarks and on one
real-world redacted email data set.
Related papers
- MyriadAL: Active Few Shot Learning for Histopathology [10.652626309100889]
We introduce an active few shot learning framework, Myriad Active Learning (MAL)
MAL includes a contrastive-learning encoder, pseudo-label generation, and novel query sample selection in the loop.
Experiments on two public histopathology datasets show that MAL has superior test accuracy, macro F1-score, and label efficiency compared to prior works.
arXiv Detail & Related papers (2023-10-24T20:08:15Z) - Active anomaly detection based on deep one-class classification [9.904380236739398]
We tackle two essential problems of active learning for Deep SVDD: query strategy and semi-supervised learning method.
First, rather than solely identifying anomalies, our query strategy selects uncertain samples according to an adaptive boundary.
Second, we apply noise contrastive estimation in training a one-class classification model to incorporate both labeled normal and abnormal data effectively.
arXiv Detail & Related papers (2023-09-18T03:56:45Z) - RoSAS: Deep Semi-Supervised Anomaly Detection with
Contamination-Resilient Continuous Supervision [21.393509817509464]
This paper proposes a novel semi-supervised anomaly detection method, which devises textitcontamination-resilient continuous supervisory signals
Our approach significantly outperforms state-of-the-art competitors by 20%-30% in AUC-PR.
arXiv Detail & Related papers (2023-07-25T04:04:49Z) - Unsupervised Model Selection for Time-series Anomaly Detection [7.8027110514393785]
We identify three classes of surrogate (unsupervised) metrics, namely, prediction error, model centrality, and performance on injected synthetic anomalies.
We formulate metric combination with multiple imperfect surrogate metrics as a robust rank aggregation problem.
Large-scale experiments on multiple real-world datasets demonstrate that our proposed unsupervised approach is as effective as selecting the most accurate model.
arXiv Detail & Related papers (2022-10-03T16:49:30Z) - Hierarchical Semi-Supervised Contrastive Learning for
Contamination-Resistant Anomaly Detection [81.07346419422605]
Anomaly detection aims at identifying deviant samples from the normal data distribution.
Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies.
We propose a novel hierarchical semi-supervised contrastive learning framework, for contamination-resistant anomaly detection.
arXiv Detail & Related papers (2022-07-24T18:49:26Z) - Data-Efficient and Interpretable Tabular Anomaly Detection [54.15249463477813]
We propose a novel framework that adapts a white-box model class, Generalized Additive Models, to detect anomalies.
In addition, the proposed framework, DIAD, can incorporate a small amount of labeled data to further boost anomaly detection performances in semi-supervised settings.
arXiv Detail & Related papers (2022-03-03T22:02:56Z) - SLA$^2$P: Self-supervised Anomaly Detection with Adversarial
Perturbation [77.71161225100927]
Anomaly detection is a fundamental yet challenging problem in machine learning.
We propose a novel and powerful framework, dubbed as SLA$2$P, for unsupervised anomaly detection.
arXiv Detail & Related papers (2021-11-25T03:53:43Z) - Learning with Noisy Labels by Targeted Relabeling [52.0329205268734]
Crowdsourcing platforms are often used to collect datasets for training deep neural networks.
We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
arXiv Detail & Related papers (2021-10-15T20:37:29Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z) - Toward Deep Supervised Anomaly Detection: Reinforcement Learning from
Partially Labeled Anomaly Data [150.9270911031327]
We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset.
Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data.
We propose here instead a deep reinforcement learning-based approach that enables an end-to-end optimization of the detection of both labeled and unlabeled anomalies.
arXiv Detail & Related papers (2020-09-15T03:05:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.