Data-Efficient and Interpretable Tabular Anomaly Detection
- URL: http://arxiv.org/abs/2203.02034v2
- Date: Sun, 4 Jun 2023 22:42:00 GMT
- Title: Data-Efficient and Interpretable Tabular Anomaly Detection
- Authors: Chun-Hao Chang, Jinsung Yoon, Sercan Arik, Madeleine Udell, Tomas
Pfister
- Abstract summary: We propose a novel framework that adapts a white-box model class, Generalized Additive Models, to detect anomalies.
In addition, the proposed framework, DIAD, can incorporate a small amount of labeled data to further boost anomaly detection performances in semi-supervised settings.
- Score: 54.15249463477813
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Anomaly detection (AD) plays an important role in numerous applications. We
focus on two understudied aspects of AD that are critical for integration into
real-world applications. First, most AD methods cannot incorporate labeled data
that are often available in practice in small quantities and can be crucial to
achieve high AD accuracy. Second, most AD methods are not interpretable, a
bottleneck that prevents stakeholders from understanding the reason behind the
anomalies. In this paper, we propose a novel AD framework that adapts a
white-box model class, Generalized Additive Models, to detect anomalies using a
partial identification objective which naturally handles noisy or heterogeneous
features. In addition, the proposed framework, DIAD, can incorporate a small
amount of labeled data to further boost anomaly detection performances in
semi-supervised settings. We demonstrate the superiority of our framework
compared to previous work in both unsupervised and semi-supervised settings
using diverse tabular datasets. For example, under 5 labeled anomalies DIAD
improves from 86.2\% to 89.4\% AUC by learning AD from unlabeled data. We also
present insightful interpretations that explain why DIAD deems certain samples
as anomalies.
Related papers
- Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech [60.08015780474457]
Alzheimer's Disease (AD) detection has emerged as a promising research area that employs machine learning classification models.
We identify within-class variation as a critical challenge in AD detection: individuals with AD exhibit a spectrum of cognitive impairments.
We propose two novel methods: Soft Target Distillation (SoTD) and Instance-level Re-balancing (InRe), targeting two problems respectively.
arXiv Detail & Related papers (2024-09-22T02:06:05Z) - Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark [101.23684938489413]
Anomaly detection (AD) is often focused on detecting anomalies for industrial quality inspection and medical lesion examination.
This work first constructs a large-scale and general-purpose COCO-AD dataset by extending COCO to the AD field.
Inspired by the metrics in the segmentation field, we propose several more practical threshold-dependent AD-specific metrics.
arXiv Detail & Related papers (2024-04-16T17:38:26Z) - Weakly Supervised Anomaly Detection via Knowledge-Data Alignment [24.125871437370357]
Anomaly detection plays a pivotal role in numerous web-based applications, including malware detection, anti-money laundering, device failure detection, and network fault analysis.
Weakly Supervised Anomaly Detection (WSAD) has been introduced with a limited number of labeled anomaly samples to enhance model performance.
We introduce a novel framework Knowledge-Data Alignment (KDAlign) to integrate rule knowledge, typically summarized by human experts, to supplement the limited labeled data.
arXiv Detail & Related papers (2024-02-06T07:57:13Z) - UADB: Unsupervised Anomaly Detection Booster [29.831918685340433]
Unsupervised Anomaly Detection (UAD) is a key data mining problem owing to its wide real-world applications.
No single assumption can describe such complexity and be valid in all scenarios.
We propose a general UAD Booster (UADB) that empowers any UAD models with adaptability to different data.
arXiv Detail & Related papers (2023-06-03T04:16:31Z) - Zero-Shot Anomaly Detection via Batch Normalization [58.291409630995744]
Anomaly detection plays a crucial role in many safety-critical application domains.
The challenge of adapting an anomaly detector to drift in the normal data distribution has led to the development of zero-shot AD techniques.
We propose a simple yet effective method called Adaptive Centered Representations (ACR) for zero-shot batch-level AD.
arXiv Detail & Related papers (2023-02-15T18:34:15Z) - Weakly Supervised Anomaly Detection: A Survey [75.26180038443462]
Anomaly detection (AD) is a crucial task in machine learning with various applications.
We present the first comprehensive survey of weakly supervised anomaly detection (WSAD) methods.
For each setting, we provide formal definitions, key algorithms, and potential future directions.
arXiv Detail & Related papers (2023-02-09T10:27:21Z) - Deep Anomaly Detection and Search via Reinforcement Learning [22.005663849044772]
We propose Deep Anomaly Detection and Search (DADS) to balance exploitation and exploration.
During the training process, DADS searches for possible anomalies with hierarchically-structured datasets.
Results show that DADS can efficiently and precisely search anomalies from unlabeled data and learn from them.
arXiv Detail & Related papers (2022-08-31T13:03:33Z) - Self-Trained One-class Classification for Unsupervised Anomaly Detection [56.35424872736276]
Anomaly detection (AD) has various applications across domains, from manufacturing to healthcare.
In this work, we focus on unsupervised AD problems whose entire training data are unlabeled and may contain both normal and anomalous samples.
To tackle this problem, we build a robust one-class classification framework via data refinement.
We show that our method outperforms state-of-the-art one-class classification method by 6.3 AUC and 12.5 average precision.
arXiv Detail & Related papers (2021-06-11T01:36:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.