Interpretable Anomaly Detection with Mondrian P{\'o}lya Forests on Data
Streams
- URL: http://arxiv.org/abs/2008.01505v1
- Date: Tue, 4 Aug 2020 13:19:07 GMT
- Title: Interpretable Anomaly Detection with Mondrian P{\'o}lya Forests on Data
Streams
- Authors: Charlie Dickens, Eric Meissner, Pablo G. Moreno, Tom Diethe
- Abstract summary: Anomaly detection at scale is an extremely challenging problem of great practicality.
Recent work has coalesced on variations of (random) $k$emphd-trees to summarise data for anomaly detection.
These methods rely on ad-hoc score functions that are not easy to interpret.
We contextualise these methods in a probabilistic framework which we call the Mondrian Polya Forest.
- Score: 6.177270420667713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anomaly detection at scale is an extremely challenging problem of great
practicality. When data is large and high-dimensional, it can be difficult to
detect which observations do not fit the expected behaviour. Recent work has
coalesced on variations of (random) $k$\emph{d-trees} to summarise data for
anomaly detection. However, these methods rely on ad-hoc score functions that
are not easy to interpret, making it difficult to asses the severity of the
detected anomalies or select a reasonable threshold in the absence of labelled
anomalies. To solve these issues, we contextualise these methods in a
probabilistic framework which we call the Mondrian \Polya{} Forest for
estimating the underlying probability density function generating the data and
enabling greater interpretability than prior work. In addition, we develop a
memory efficient variant able to operate in the modern streaming environments.
Our experiments show that these methods achieves state-of-the-art performance
while providing statistically interpretable anomaly scores.
Related papers
- TabADM: Unsupervised Tabular Anomaly Detection with Diffusion Models [5.314466196448187]
We present a diffusion-based probabilistic model effective for unsupervised anomaly detection.
Our model is trained to learn the density of normal samples by utilizing a unique rejection scheme.
At inference, we identify anomalies as samples in low-density regions.
arXiv Detail & Related papers (2023-07-23T14:02:33Z) - Augment to Detect Anomalies with Continuous Labelling [10.646747658653785]
Anomaly detection is to recognize samples that differ in some respect from the training observations.
Recent state-of-the-art deep learning-based anomaly detection methods suffer from high computational cost, complexity, unstable training procedures, and non-trivial implementation.
We leverage a simple learning procedure that trains a lightweight convolutional neural network, reaching state-of-the-art performance in anomaly detection.
arXiv Detail & Related papers (2022-07-03T20:11:51Z) - MissDAG: Causal Discovery in the Presence of Missing Data with
Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations.
MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework.
We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z) - SLA$^2$P: Self-supervised Anomaly Detection with Adversarial
Perturbation [77.71161225100927]
Anomaly detection is a fundamental yet challenging problem in machine learning.
We propose a novel and powerful framework, dubbed as SLA$2$P, for unsupervised anomaly detection.
arXiv Detail & Related papers (2021-11-25T03:53:43Z) - Explainable Deep Few-shot Anomaly Detection with Deviation Networks [123.46611927225963]
We introduce a novel weakly-supervised anomaly detection framework to train detection models.
The proposed approach learns discriminative normality by leveraging the labeled anomalies and a prior probability.
Our model is substantially more sample-efficient and robust, and performs significantly better than state-of-the-art competing methods in both closed-set and open-set settings.
arXiv Detail & Related papers (2021-08-01T14:33:17Z) - An algorithm-based multiple detection influence measure for high
dimensional regression using expectile [0.4999814847776096]
We propose an algorithm-based, multi-step, multiple detection procedure to identify influential observations.
Our three-step algorithm to identify and capture undesirable variability in the data, $asymMIP,$ is based on two complementary statistics.
The application of our method to the Autism Brain Imaging Data Exchange dataset resulted in a more balanced and accurate prediction of brain maturity.
arXiv Detail & Related papers (2021-05-26T01:16:24Z) - Deconfounded Score Method: Scoring DAGs with Dense Unobserved
Confounding [101.35070661471124]
We show that unobserved confounding leaves a characteristic footprint in the observed data distribution that allows for disentangling spurious and causal effects.
We propose an adjusted score-based causal discovery algorithm that may be implemented with general-purpose solvers and scales to high-dimensional problems.
arXiv Detail & Related papers (2021-03-28T11:07:59Z) - Low-rank on Graphs plus Temporally Smooth Sparse Decomposition for
Anomaly Detection in Spatiotemporal Data [37.65687661747699]
We introduce an unsupervised tensor-based anomaly detection method that takes the sparse and temporally continuous nature of anomalies into account.
The resulting optimization problem is convex, scalable, and is shown to be robust against missing data and noise.
arXiv Detail & Related papers (2020-10-23T19:34:40Z) - Toward Deep Supervised Anomaly Detection: Reinforcement Learning from
Partially Labeled Anomaly Data [150.9270911031327]
We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset.
Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data.
We propose here instead a deep reinforcement learning-based approach that enables an end-to-end optimization of the detection of both labeled and unlabeled anomalies.
arXiv Detail & Related papers (2020-09-15T03:05:39Z) - Overcoming the curse of dimensionality with Laplacian regularization in
semi-supervised learning [80.20302993614594]
We provide a statistical analysis to overcome drawbacks of Laplacian regularization.
We unveil a large body of spectral filtering methods that exhibit desirable behaviors.
We provide realistic computational guidelines in order to make our method usable with large amounts of data.
arXiv Detail & Related papers (2020-09-09T14:28:54Z) - Anomaly Detection in Trajectory Data with Normalizing Flows [0.0]
We propose an approach based on normalizing flows that enables complex density estimation from data with neural networks.
Our proposal computes exact model likelihood values, an important feature of normalizing flows, for each segment of the trajectory.
We evaluate our methodology, named aggregated anomaly detection with normalizing flows (GRADINGS), using real world trajectory data and compare it with more traditional anomaly detection techniques.
arXiv Detail & Related papers (2020-04-13T14:16:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.