Related papers: Random Subspace Mixture Models for Interpretable Anomaly Detection

Random Subspace Mixture Models for Interpretable Anomaly Detection

URL: http://arxiv.org/abs/2108.06283v1
Date: Fri, 13 Aug 2021 15:12:53 GMT
Title: Random Subspace Mixture Models for Interpretable Anomaly Detection
Authors: Cetin Savkli, Catherine Schwartz
Abstract summary: We present a new subspace-based method to construct probabilistic models for high-dimensional data. The proposed algorithm attains competitive AUC scores compared with prominent algorithms against benchmark anomaly detection datasets.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a new subspace-based method to construct probabilistic models for high-dimensional data and highlight its use in anomaly detection. The approach is based on a statistical estimation of probability density using densities of random subspaces combined with geometric averaging. In selecting random subspaces, equal representation of each attribute is used to ensure correct statistical limits. Gaussian mixture models (GMMs) are used to create the probability densities for each subspace with techniques included to mitigate singularities allowing for the ability to handle both numerical and categorial attributes. The number of components for each GMM is determined automatically through Bayesian information criterion to prevent overfitting. The proposed algorithm attains competitive AUC scores compared with prominent algorithms against benchmark anomaly detection datasets with the added benefits of being simple, scalable, and interpretable.

Related papers

Mixture models for data with unknown distributions [0.6345523830122168]
We describe and analyze a broad class of mixture models for real-valued multivariate data. We return both a division of the data and an estimate of the distributions, effectively performing clustering and density estimation within each cluster at the same time. We demonstrate our methods with a selection of illustrative applications and give code implementing both algorithms.
arXiv Detail & Related papers (2025-02-26T22:42:40Z)
Anomaly Detection Under Uncertainty Using Distributionally Robust Optimization Approach [0.9217021281095907]
Anomaly detection is defined as the problem of finding data points that do not follow the patterns of the majority. The one-class Support Vector Machines (SVM) method aims to find a decision boundary to distinguish between normal data points and anomalies. A distributionally robust chance-constrained model is proposed in which the probability of misclassification is low.
arXiv Detail & Related papers (2023-12-03T06:13:22Z)
Learning to Bound Counterfactual Inference in Structural Causal Models from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z)
Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z)
Subspace clustering in high-dimensions: Phase transitions \& Statistical-to-Computational gap [24.073221004661427]
A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model. We provide an exact characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity.
arXiv Detail & Related papers (2022-05-26T17:47:35Z)
A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data. A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z)
Probabilistic spatial clustering based on the Self Discipline Learning (SDL) model of autonomous learning [1.9322517897534983]
Unsupervised clustering algorithm can effectively reduce the dimension of high-dimensional unlabeled data. Traditional clustering algorithm needs to set the upper bound of the number of categories in advance. Probability spatial clustering algorithm based on the Self Discipline Learning(SDL) model is proposed.
arXiv Detail & Related papers (2022-01-07T17:18:57Z)
Identification of Probability weighted ARX models with arbitrary domains [75.91002178647165]
PieceWise Affine models guarantees universal approximation, local linearity and equivalence to other classes of hybrid system. In this work, we focus on the identification of PieceWise Auto Regressive with eXogenous input models with arbitrary regions (NPWARX) The architecture is conceived following the Mixture of Expert concept, developed within the machine learning field.
arXiv Detail & Related papers (2020-09-29T12:50:33Z)
Categorical anomaly detection in heterogeneous data using minimum description length clustering [3.871148938060281]
We propose a meta-algorithm for enhancing any MDL-based anomaly detection model to deal with heterogeneous data. Our experimental results show that using a discrete mixture model provides competitive performance relative to two previous anomaly detection algorithms.
arXiv Detail & Related papers (2020-06-14T14:48:37Z)
Robust M-Estimation Based Bayesian Cluster Enumeration for Real Elliptically Symmetric Distributions [5.137336092866906]
Robustly determining optimal number of clusters in a data set is an essential factor in a wide range of applications. This article generalizes so that it can be used with any arbitrary Really Symmetric (RES) distributed mixture model. We derive a robust criterion for data sets with finite sample size, and also provide an approximation to reduce the computational cost at large sample sizes.
arXiv Detail & Related papers (2020-05-04T11:44:49Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
Distributed, partially collapsed MCMC for Bayesian Nonparametrics [68.5279360794418]
We exploit the fact that completely random measures, which commonly used models like the Dirichlet process and the beta-Bernoulli process can be expressed as, are decomposable into independent sub-measures. We use this decomposition to partition the latent measure into a finite measure containing only instantiated components, and an infinite measure containing all other components. The resulting hybrid algorithm can be applied to allow scalable inference without sacrificing convergence guarantees.
arXiv Detail & Related papers (2020-01-15T23:10:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.