Learning to Detect Interesting Anomalies
- URL: http://arxiv.org/abs/2210.16334v1
- Date: Fri, 28 Oct 2022 18:00:06 GMT
- Title: Learning to Detect Interesting Anomalies
- Authors: Alireza Vafaei Sadr, Bruce A. Bassett, Emmanuel Sekyi
- Abstract summary: AHUNT shows excellent performance on MNIST, CIFAR10, and Galaxy-DESI data.
AHUNT also allows the number of anomaly classes to grow organically in response to Oracle's evaluations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anomaly detection algorithms are typically applied to static, unchanging,
data features hand-crafted by the user. But how does a user systematically
craft good features for anomalies that have never been seen? Here we couple
deep learning with active learning -- in which an Oracle iteratively labels
small amounts of data selected algorithmically over a series of rounds -- to
automatically and dynamically improve the data features for efficient outlier
detection. This approach, AHUNT, shows excellent performance on MNIST, CIFAR10,
and Galaxy-DESI data, significantly outperforming both standard anomaly
detection and active learning algorithms with static feature spaces. Beyond
improved performance, AHUNT also allows the number of anomaly classes to grow
organically in response to Oracle's evaluations. Extensive ablation studies
explore the impact of Oracle question selection strategy and loss function on
performance. We illustrate how the dynamic anomaly class taxonomy represents
another step towards fully personalized rankings of different anomaly classes
that reflect a user's interests, allowing the algorithm to learn to ignore
statistically significant but uninteresting outliers (e.g., noise). This should
prove useful in the era of massive astronomical datasets serving diverse sets
of users who can only review a tiny subset of the incoming data.
Related papers
- Cluster Metric Sensitivity to Irrelevant Features [0.0]
We show how different types of irrelevant variables can impact the outcome of a clustering result from $k$-means in different ways.
Our results show that the Silhouette Coefficient and the Davies-Bouldin score are the most sensitive to irrelevant added features.
arXiv Detail & Related papers (2024-02-19T10:02:00Z) - Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A
Benchmarking Study [0.6291443816903801]
This paper evaluates a diverse array of machine learning-based anomaly detection algorithms.
The paper contributes significantly by conducting an unbiased comparison of various anomaly detection algorithms.
arXiv Detail & Related papers (2024-02-11T19:12:51Z) - Active anomaly detection based on deep one-class classification [9.904380236739398]
We tackle two essential problems of active learning for Deep SVDD: query strategy and semi-supervised learning method.
First, rather than solely identifying anomalies, our query strategy selects uncertain samples according to an adaptive boundary.
Second, we apply noise contrastive estimation in training a one-class classification model to incorporate both labeled normal and abnormal data effectively.
arXiv Detail & Related papers (2023-09-18T03:56:45Z) - Learning Objective-Specific Active Learning Strategies with Attentive
Neural Processes [72.75421975804132]
Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting.
We propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem.
Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives.
arXiv Detail & Related papers (2023-09-11T14:16:37Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Neural Active Learning on Heteroskedastic Distributions [29.01776999862397]
We demonstrate the catastrophic failure of active learning algorithms on heteroskedastic datasets.
We propose a new algorithm that incorporates a model difference scoring function for each data point to filter out the noisy examples and sample clean examples.
arXiv Detail & Related papers (2022-11-02T07:30:19Z) - Representation Learning for the Automatic Indexing of Sound Effects
Libraries [79.68916470119743]
We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size.
Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.
arXiv Detail & Related papers (2022-08-18T23:46:13Z) - SLA$^2$P: Self-supervised Anomaly Detection with Adversarial
Perturbation [77.71161225100927]
Anomaly detection is a fundamental yet challenging problem in machine learning.
We propose a novel and powerful framework, dubbed as SLA$2$P, for unsupervised anomaly detection.
arXiv Detail & Related papers (2021-11-25T03:53:43Z) - Active learning for reducing labeling effort in text classification
tasks [3.8424737607413153]
Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative.
We present an empirical study that compares different uncertainty-based algorithms BERT$_base$ as the used classifiers.
Our results show that using uncertainty-based AL with BERT$base$ outperforms random sampling of data.
arXiv Detail & Related papers (2021-09-10T13:00:36Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.