A Robust Autoencoder Ensemble-Based Approach for Anomaly Detection in Text
- URL: http://arxiv.org/abs/2405.13031v1
- Date: Thu, 16 May 2024 10:45:43 GMT
- Title: A Robust Autoencoder Ensemble-Based Approach for Anomaly Detection in Text
- Authors: Jeremie Pantin, Christophe Marsala,
- Abstract summary: This work introduces a robust autoencoder ensemble-based approach designed to address anomaly detection in text corpora.
A real-world taxonomy is introduced to distinguish between independent anomalies and contextual anomalies.
Extensive experiments on classical text corpora have been conducted and their results are presented.
- Score: 0.3314882635954752
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this work, a robust autoencoder ensemble-based approach designed to address anomaly detection in text corpora is introduced. Each autoencoder within the ensemble incorporates a local robust subspace recovery projection of the original data in its encoding embedding, leveraging the geometric properties of the k-nearest neighbors to optimize subspace recovery and identify anomalous patterns in textual data. The evaluation of such an approach needs an experimental setting dedicated to the context of textual anomaly detection. Thus, beforehand, a comprehensive real-world taxonomy is introduced to distinguish between independent anomalies and contextual anomalies. Such a study to identify clearly the kinds of anomalies appearing in a textual context aims at addressing a critical gap in the existing literature. Then, extensive experiments on classical text corpora have been conducted and their results are presented that highlights the efficiency, both in robustness and in performance, of the robust autoencoder ensemble-based approach when detecting both independent and contextual anomalies. Diverse range of tasks, including classification, sentiment analysis, and spam detection, across eight different corpora, have been studied in these experiments.
Related papers
- Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding [27.02879006439693]
This work performs a comprehensive empirical study and introduces a benchmark for text anomaly detection.<n>Our work systematically evaluates the effectiveness of embedding-based text anomaly detection.<n>By open-sourcing our benchmark toolkit, this work provides a foundation for future research in robust and scalable text anomaly detection systems.
arXiv Detail & Related papers (2025-07-16T14:47:41Z) - Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions [0.017476232824732776]
Time-series anomaly detection plays an important role in engineering processes.
This survey introduces a novel taxonomy where a distinction between online and offline, and training and inference is made.
It presents the most popular data sets and evaluation metrics used in the literature, as well as a detailed analysis.
arXiv Detail & Related papers (2024-08-07T13:01:10Z) - GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features [68.14842693208465]
GeneralAD is an anomaly detection framework designed to operate in semantic, near-distribution, and industrial settings.
We propose a novel self-supervised anomaly generation module that employs straightforward operations like noise addition and shuffling to patch features.
We extensively evaluated our approach on ten datasets, achieving state-of-the-art results in six and on-par performance in the remaining.
arXiv Detail & Related papers (2024-07-17T09:27:41Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Oracle Character Recognition using Unsupervised Discriminative
Consistency Network [65.64172835624206]
We propose a novel unsupervised domain adaptation method for oracle character recognition (OrCR)
We leverage pseudo-labeling to incorporate the semantic information into adaptation and constrain augmentation consistency.
Our approach achieves state-of-the-art result on Oracle-241 dataset and substantially outperforms the recently proposed structure-texture separation network by 15.1%.
arXiv Detail & Related papers (2023-12-11T02:52:27Z) - Don't Miss Out on Novelty: Importance of Novel Features for Deep Anomaly
Detection [64.21963650519312]
Anomaly Detection (AD) is a critical task that involves identifying observations that do not conform to a learned model of normality.
We propose a novel approach to AD using explainability to capture such novel features as unexplained observations in the input space.
Our approach establishes a new state-of-the-art across multiple benchmarks, handling diverse anomaly types.
arXiv Detail & Related papers (2023-10-01T21:24:05Z) - AnoRand: A Semi Supervised Deep Learning Anomaly Detection Method by
Random Labeling [0.0]
Anomaly detection or more generally outliers detection is one of the most popular and challenging subject in theoretical and applied machine learning.
We present a new semi-supervised anomaly detection method called textbfAnoRand by combining a deep learning architecture with random synthetic label generation.
arXiv Detail & Related papers (2023-05-28T10:53:34Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z) - Text Recognition in Real Scenarios with a Few Labeled Samples [55.07859517380136]
Scene text recognition (STR) is still a hot research topic in computer vision field.
This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation.
Our approach can maximize the character-level confusion between the source domain and the target domain.
arXiv Detail & Related papers (2020-06-22T13:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.