The Impact of Discretization Method on the Detection of Six Types of
Anomalies in Datasets
- URL: http://arxiv.org/abs/2008.12330v1
- Date: Thu, 27 Aug 2020 18:43:55 GMT
- Title: The Impact of Discretization Method on the Detection of Six Types of
Anomalies in Datasets
- Authors: Ralph Foorthuis
- Abstract summary: Anomaly detection is the process of identifying cases, or groups of cases, that are in some way unusual and do not fit the general patterns present in the dataset.
Numerous algorithms use discretization of numerical data in their detection processes.
This study investigates the effect of the discretization method on the unsupervised detection of each of the six anomaly types acknowledged in a recent typology of data anomalies.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anomaly detection is the process of identifying cases, or groups of cases,
that are in some way unusual and do not fit the general patterns present in the
dataset. Numerous algorithms use discretization of numerical data in their
detection processes. This study investigates the effect of the discretization
method on the unsupervised detection of each of the six anomaly types
acknowledged in a recent typology of data anomalies. To this end, experiments
are conducted with various datasets and SECODA, a general-purpose algorithm for
unsupervised non-parametric anomaly detection in datasets with numerical and
categorical attributes. This algorithm employs discretization of continuous
attributes, exponentially increasing weights and discretization cut points, and
a pruning heuristic to detect anomalies with an optimal number of iterations.
The results demonstrate that standard SECODA can detect all six types, but that
different discretization methods favor the discovery of certain anomaly types.
The main findings also hold for other detection techniques using
discretization.
Related papers
- Anomaly Detection by Context Contrasting [57.695202846009714]
Anomaly detection focuses on identifying samples that deviate from the norm.
Recent advances in self-supervised learning have shown great promise in this regard.
We propose Con$$, which learns through context augmentations.
arXiv Detail & Related papers (2024-05-29T07:59:06Z) - Binary Quantification and Dataset Shift: An Experimental Investigation [54.14283123210872]
Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data.
The relationship between quantification and other types of dataset shift remains, by and large, unexplored.
We propose a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift.
arXiv Detail & Related papers (2023-10-06T20:11:27Z) - WePaMaDM-Outlier Detection: Weighted Outlier Detection using Pattern
Approaches for Mass Data Mining [0.6754597324022876]
Outlier detection can reveal vital information about system faults, fraudulent activities, and patterns in the data.
This article proposed the WePaMaDM-Outlier Detection with distinct mass data mining domain.
It also investigates the significance of data modeling in outlier detection techniques in surveillance, fault detection, and trend analysis.
arXiv Detail & Related papers (2023-06-09T07:00:00Z) - AGAD: Adversarial Generative Anomaly Detection [12.68966318231776]
Anomaly detection suffered from the lack of anomalies due to the diversity of abnormalities and the difficulties of obtaining large-scale anomaly data.
We propose Adversarial Generative Anomaly Detection (AGAD), a self-contrast-based anomaly detection paradigm.
Our method generates pseudo-anomaly data for both supervised and semi-supervised anomaly detection scenarios.
arXiv Detail & Related papers (2023-04-09T10:40:02Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Early Abnormal Detection of Sewage Pipe Network: Bagging of Various
Abnormal Detection Algorithms [3.1720050808705804]
Abnormalities of the sewage pipe network will affect the normal operation of the whole city.
This paper propose an early abnormal-detection method.
arXiv Detail & Related papers (2022-06-06T03:46:47Z) - MissDAG: Causal Discovery in the Presence of Missing Data with
Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations.
MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework.
We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z) - Explainable Deep Few-shot Anomaly Detection with Deviation Networks [123.46611927225963]
We introduce a novel weakly-supervised anomaly detection framework to train detection models.
The proposed approach learns discriminative normality by leveraging the labeled anomalies and a prior probability.
Our model is substantially more sample-efficient and robust, and performs significantly better than state-of-the-art competing methods in both closed-set and open-set settings.
arXiv Detail & Related papers (2021-08-01T14:33:17Z) - Algorithmic Frameworks for the Detection of High Density Anomalies [0.0]
High-density anomalies are deviant cases positioned in the most normal regions of the data space.
This study introduces several non-parametric algorithmic frameworks for unsupervised detection.
arXiv Detail & Related papers (2020-10-09T17:48:02Z) - Toward Deep Supervised Anomaly Detection: Reinforcement Learning from
Partially Labeled Anomaly Data [150.9270911031327]
We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset.
Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data.
We propose here instead a deep reinforcement learning-based approach that enables an end-to-end optimization of the detection of both labeled and unlabeled anomalies.
arXiv Detail & Related papers (2020-09-15T03:05:39Z) - SECODA: Segmentation- and Combination-Based Detection of Anomalies [0.0]
SECODA is an unsupervised non-parametric anomaly detection algorithm for datasets containing continuous and categorical attributes.
The algorithm has a low memory imprint and its runtime performance scales linearly with the size of the dataset.
An evaluation with simulated and real-life datasets shows that this algorithm is able to identify many different types of anomalies.
arXiv Detail & Related papers (2020-08-16T10:03:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.