Related papers: Learning to Detect Interesting Anomalies

Learning to Detect Interesting Anomalies

URL: http://arxiv.org/abs/2210.16334v1
Date: Fri, 28 Oct 2022 18:00:06 GMT
Title: Learning to Detect Interesting Anomalies
Authors: Alireza Vafaei Sadr, Bruce A. Bassett, Emmanuel Sekyi
Abstract summary: AHUNT shows excellent performance on MNIST, CIFAR10, and Galaxy-DESI data. AHUNT also allows the number of anomaly classes to grow organically in response to Oracle's evaluations.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Anomaly detection algorithms are typically applied to static, unchanging, data features hand-crafted by the user. But how does a user systematically craft good features for anomalies that have never been seen? Here we couple deep learning with active learning -- in which an Oracle iteratively labels small amounts of data selected algorithmically over a series of rounds -- to automatically and dynamically improve the data features for efficient outlier detection. This approach, AHUNT, shows excellent performance on MNIST, CIFAR10, and Galaxy-DESI data, significantly outperforming both standard anomaly detection and active learning algorithms with static feature spaces. Beyond improved performance, AHUNT also allows the number of anomaly classes to grow organically in response to Oracle's evaluations. Extensive ablation studies explore the impact of Oracle question selection strategy and loss function on performance. We illustrate how the dynamic anomaly class taxonomy represents another step towards fully personalized rankings of different anomaly classes that reflect a user's interests, allowing the algorithm to learn to ignore statistically significant but uninteresting outliers (e.g., noise). This should prove useful in the era of massive astronomical datasets serving diverse sets of users who can only review a tiny subset of the incoming data.

Related papers

LLM-assisted Explicit and Implicit Multi-interest Learning Framework for Sequential Recommendation [50.98046887582194]
We propose an explicit and implicit multi-interest learning framework to model user interests on two levels: behavior and semantics. The proposed EIMF framework effectively and efficiently combines small models with LLM to improve the accuracy of multi-interest modeling.
arXiv Detail & Related papers (2024-11-14T13:00:23Z)
Cluster Metric Sensitivity to Irrelevant Features [0.0]
We show how different types of irrelevant variables can impact the outcome of a clustering result from $k$-means in different ways. Our results show that the Silhouette Coefficient and the Davies-Bouldin score are the most sensitive to irrelevant added features.
arXiv Detail & Related papers (2024-02-19T10:02:00Z)
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study [0.6291443816903801]
This paper evaluates a diverse array of machine learning-based anomaly detection algorithms. The paper contributes significantly by conducting an unbiased comparison of various anomaly detection algorithms.
arXiv Detail & Related papers (2024-02-11T19:12:51Z)
Active anomaly detection based on deep one-class classification [9.904380236739398]
We tackle two essential problems of active learning for Deep SVDD: query strategy and semi-supervised learning method. First, rather than solely identifying anomalies, our query strategy selects uncertain samples according to an adaptive boundary. Second, we apply noise contrastive estimation in training a one-class classification model to incorporate both labeled normal and abnormal data effectively.
arXiv Detail & Related papers (2023-09-18T03:56:45Z)
Learning Objective-Specific Active Learning Strategies with Attentive Neural Processes [72.75421975804132]
Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting. We propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem. Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives.
arXiv Detail & Related papers (2023-09-11T14:16:37Z)
Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data. We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations. Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Neural Active Learning on Heteroskedastic Distributions [29.01776999862397]
We demonstrate the catastrophic failure of active learning algorithms on heteroskedastic datasets. We propose a new algorithm that incorporates a model difference scoring function for each data point to filter out the noisy examples and sample clean examples.
arXiv Detail & Related papers (2022-11-02T07:30:19Z)
Representation Learning for the Automatic Indexing of Sound Effects Libraries [79.68916470119743]
We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size. Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.
arXiv Detail & Related papers (2022-08-18T23:46:13Z)
SLA$^2$P: Self-supervised Anomaly Detection with Adversarial Perturbation [77.71161225100927]
Anomaly detection is a fundamental yet challenging problem in machine learning. We propose a novel and powerful framework, dubbed as SLA$2$P, for unsupervised anomaly detection.
arXiv Detail & Related papers (2021-11-25T03:53:43Z)
Active learning for reducing labeling effort in text classification tasks [3.8424737607413153]
Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. We present an empirical study that compares different uncertainty-based algorithms BERT$_base$ as the used classifiers. Our results show that using uncertainty-based AL with BERT$base$ outperforms random sampling of data.
arXiv Detail & Related papers (2021-09-10T13:00:36Z)
Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations. We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.