Interpretable Outlier Summarization
- URL: http://arxiv.org/abs/2303.06261v3
- Date: Fri, 1 Sep 2023 07:49:21 GMT
- Title: Interpretable Outlier Summarization
- Authors: Yu Wang, Lei Cao, Yizhou Yan, Samuel Madden
- Abstract summary: Outlier detection is critical in real applications to prevent financial fraud, defend network intrusions, or detecting imminent device failures.
We propose STAIR which learns a compact set of human understandable rules to summarize and explain the anomaly detection results.
Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results.
- Score: 10.41121739124057
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Outlier detection is critical in real applications to prevent financial
fraud, defend network intrusions, or detecting imminent device failures. To
reduce the human effort in evaluating outlier detection results and effectively
turn the outliers into actionable insights, the users often expect a system to
automatically produce interpretable summarizations of subgroups of outlier
detection results. Unfortunately, to date no such systems exist. To fill this
gap, we propose STAIR which learns a compact set of human understandable rules
to summarize and explain the anomaly detection results. Rather than use the
classical decision tree algorithms to produce these rules, STAIR proposes a new
optimization objective to produce a small number of rules with least
complexity, hence strong interpretability, to accurately summarize the
detection results. The learning algorithm of STAIR produces a rule set by
iteratively splitting the large rules and is optimal in maximizing this
objective in each iteration. Moreover, to effectively handle high dimensional,
highly complex data sets which are hard to summarize with simple rules, we
propose a localized STAIR approach, called L-STAIR. Taking data locality into
consideration, it simultaneously partitions data and learns a set of localized
rules for each partition. Our experimental study on many outlier benchmark
datasets shows that STAIR significantly reduces the complexity of the rules
required to summarize the outlier detection results, thus more amenable for
humans to understand and evaluate, compared to the decision tree methods.
Related papers
- Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD)
In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency.
Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z) - DTOR: Decision Tree Outlier Regressor to explain anomalies [37.00322799216377]
Decision Tree Outlier Regressor (DTOR) is a technique for producing rule-based explanations for individual data points.
Our results demonstrate the robustness of DTOR even in datasets with a large number of features.
arXiv Detail & Related papers (2024-03-16T11:38:31Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Implicit models, latent compression, intrinsic biases, and cheap lunches
in community detection [0.0]
Community detection aims to partition a network into clusters of nodes to summarize its large-scale structure.
Some community detection methods are inferential, explicitly deriving the clustering objective through a probabilistic generative model.
Other methods are descriptive, dividing a network according to an objective motivated by a particular application.
We present a solution that associates any community detection objective, inferential or descriptive, with its corresponding implicit network generative model.
arXiv Detail & Related papers (2022-10-17T15:38:41Z) - Diminishing Empirical Risk Minimization for Unsupervised Anomaly
Detection [0.0]
Empirical Risk Minimization (ERM) assumes that the performance of an algorithm on an unknown distribution can be approximated by averaging losses on the known training set.
We propose a novel Diminishing Empirical Risk Minimization (DERM) framework to break through the limitations of ERM.
DERM adaptively adjusts the impact of individual losses through a well-devised aggregation strategy.
arXiv Detail & Related papers (2022-05-29T14:18:26Z) - Little Help Makes a Big Difference: Leveraging Active Learning to
Improve Unsupervised Time Series Anomaly Detection [2.1684857243537334]
A large set of anomaly detection algorithms have been deployed for detecting unexpected network incidents.
Unsupervised anomaly detection algorithms often suffer from excessive false alarms.
We propose to use active learning to introduce and benefit from the feedback of operators.
arXiv Detail & Related papers (2022-01-25T13:54:19Z) - Unsupervised Learning of Debiased Representations with Pseudo-Attributes [85.5691102676175]
We propose a simple but effective debiasing technique in an unsupervised manner.
We perform clustering on the feature embedding space and identify pseudoattributes by taking advantage of the clustering results.
We then employ a novel cluster-based reweighting scheme for learning debiased representation.
arXiv Detail & Related papers (2021-08-06T05:20:46Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Learning by Minimizing the Sum of Ranked Range [58.24935359348289]
We introduce the sum of ranked range (SoRR) as a general approach to form learning objectives.
A ranked range is a consecutive sequence of sorted values of a set of real numbers.
We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification.
arXiv Detail & Related papers (2020-10-05T01:58:32Z) - Interpolation-based semi-supervised learning for object detection [44.37685664440632]
We propose an Interpolation-based Semi-supervised learning method for object detection.
The proposed losses dramatically improve the performance of semi-supervised learning as well as supervised learning.
arXiv Detail & Related papers (2020-06-03T10:53:44Z) - An Information Bottleneck Approach for Controlling Conciseness in
Rationale Extraction [84.49035467829819]
We show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective.
Our fully unsupervised approach jointly learns an explainer that predicts sparse binary masks over sentences, and an end-task predictor that considers only the extracted rationale.
arXiv Detail & Related papers (2020-05-01T23:26:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.