Retrieval Augmented Deep Anomaly Detection for Tabular Data
- URL: http://arxiv.org/abs/2401.17052v2
- Date: Mon, 22 Jul 2024 06:23:02 GMT
- Title: Retrieval Augmented Deep Anomaly Detection for Tabular Data
- Authors: Hugo Thimonier, Fabrice Popineau, Arpad Rimmel, Bich-Liên Doan,
- Abstract summary: Research has introduced retrieval-augmented models to address this gap.
We propose a reconstruction-based approach in which a transformer model learns to reconstruct masked features of textitnormal samples.
Experiments on a benchmark of 31 datasets reveal that augmenting this reconstruction-based anomaly detection method with sample-sample dependencies via retrieval modules significantly boosts performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning for tabular data has garnered increasing attention in recent years, yet employing deep models for structured data remains challenging. While these models excel with unstructured data, their efficacy with structured data has been limited. Recent research has introduced retrieval-augmented models to address this gap, demonstrating promising results in supervised tasks such as classification and regression. In this work, we investigate using retrieval-augmented models for anomaly detection on tabular data. We propose a reconstruction-based approach in which a transformer model learns to reconstruct masked features of \textit{normal} samples. We test the effectiveness of KNN-based and attention-based modules to select relevant samples to help in the reconstruction process of the target sample. Our experiments on a benchmark of 31 tabular datasets reveal that augmenting this reconstruction-based anomaly detection (AD) method with sample-sample dependencies via retrieval modules significantly boosts performance. The present work supports the idea that retrieval module are useful to augment any deep AD method to enhance anomaly detection on tabular data.
Related papers
- Distributionally robust self-supervised learning for tabular data [2.942619386779508]
Learning robust representation in presence of error slices is challenging, due to high cardinality features and the complexity of constructing error sets.
Traditional robust representation learning methods are largely focused on improving worst group performance in supervised setting in computer vision.
Our approach utilizes an encoder-decoder model trained with Masked Language Modeling (MLM) loss to learn robust latent representations.
arXiv Detail & Related papers (2024-10-11T04:23:56Z) - TRIAGE: Characterizing and auditing training data for improved
regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors.
TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score.
We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z) - Active anomaly detection based on deep one-class classification [9.904380236739398]
We tackle two essential problems of active learning for Deep SVDD: query strategy and semi-supervised learning method.
First, rather than solely identifying anomalies, our query strategy selects uncertain samples according to an adaptive boundary.
Second, we apply noise contrastive estimation in training a one-class classification model to incorporate both labeled normal and abnormal data effectively.
arXiv Detail & Related papers (2023-09-18T03:56:45Z) - Fascinating Supervisory Signals and Where to Find Them: Deep Anomaly
Detection with Scale Learning [11.245813423781415]
We devise novel data-driven supervision for data by introducing a characteristic -- scale -- as data labels.
Scales serve as labels attached to transformed representations, thus offering ample labeled data for neural network training.
This paper further proposes a scale learning-based anomaly detection method.
arXiv Detail & Related papers (2023-05-25T14:48:00Z) - Beyond Individual Input for Deep Anomaly Detection on Tabular Data [0.0]
Anomaly detection is vital in many domains, such as finance, healthcare, and cybersecurity.
To the best of our knowledge, this is the first work to successfully combine feature-feature and sample-sample dependencies.
Our method achieves state-of-the-art performance, outperforming existing methods by 2.4% and 1.2% in terms of F1-score and AUROC, respectively.
arXiv Detail & Related papers (2023-05-24T13:13:26Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - Watermarking for Out-of-distribution Detection [76.20630986010114]
Out-of-distribution (OOD) detection aims to identify OOD data based on representations extracted from well-trained deep models.
We propose a general methodology named watermarking in this paper.
We learn a unified pattern that is superimposed onto features of original data, and the model's detection capability is largely boosted after watermarking.
arXiv Detail & Related papers (2022-10-27T06:12:32Z) - Discovery of Governing Equations with Recursive Deep Neural Networks [5.031093893882574]
This paper focuses on the model discovery problem when the data is not efficiently sampled in time.
We introduce a recursion deep neural network (RDNN) for data-driven model discovery.
Our proposed approach shows superior power when the existing data are sampled with a large time lag.
arXiv Detail & Related papers (2020-09-24T05:59:03Z) - Data from Model: Extracting Data from Non-robust and Robust Models [83.60161052867534]
This work explores the reverse process of generating data from a model, attempting to reveal the relationship between the data and the model.
We repeat the process of Data to Model (DtM) and Data from Model (DfM) in sequence and explore the loss of feature mapping information.
Our results show that the accuracy drop is limited even after multiple sequences of DtM and DfM, especially for robust models.
arXiv Detail & Related papers (2020-07-13T05:27:48Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.