A Robust Autoencoder Ensemble-Based Approach for Anomaly Detection in Text
- URL: http://arxiv.org/abs/2405.13031v1
- Date: Thu, 16 May 2024 10:45:43 GMT
- Title: A Robust Autoencoder Ensemble-Based Approach for Anomaly Detection in Text
- Authors: Jeremie Pantin, Christophe Marsala,
- Abstract summary: This work introduces a robust autoencoder ensemble-based approach designed to address anomaly detection in text corpora.
A real-world taxonomy is introduced to distinguish between independent anomalies and contextual anomalies.
Extensive experiments on classical text corpora have been conducted and their results are presented.
- Score: 0.3314882635954752
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this work, a robust autoencoder ensemble-based approach designed to address anomaly detection in text corpora is introduced. Each autoencoder within the ensemble incorporates a local robust subspace recovery projection of the original data in its encoding embedding, leveraging the geometric properties of the k-nearest neighbors to optimize subspace recovery and identify anomalous patterns in textual data. The evaluation of such an approach needs an experimental setting dedicated to the context of textual anomaly detection. Thus, beforehand, a comprehensive real-world taxonomy is introduced to distinguish between independent anomalies and contextual anomalies. Such a study to identify clearly the kinds of anomalies appearing in a textual context aims at addressing a critical gap in the existing literature. Then, extensive experiments on classical text corpora have been conducted and their results are presented that highlights the efficiency, both in robustness and in performance, of the robust autoencoder ensemble-based approach when detecting both independent and contextual anomalies. Diverse range of tasks, including classification, sentiment analysis, and spam detection, across eight different corpora, have been studied in these experiments.
Related papers
- GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features [68.14842693208465]
GeneralAD is an anomaly detection framework designed to operate in semantic, near-distribution, and industrial settings.
We propose a novel self-supervised anomaly generation module that employs straightforward operations like noise addition and shuffling to patch features.
We extensively evaluated our approach on ten datasets, achieving state-of-the-art results in six and on-par performance in the remaining.
arXiv Detail & Related papers (2024-07-17T09:27:41Z) - Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors [24.954755569786396]
AI-text detection has emerged to distinguish between human and machine-generated content.
Recent research indicates that these detection systems often lack robustness and struggle to effectively differentiate perturbed texts.
Our work simulates real-world scenarios in both informal and professional writing, exploring the out-of-the-box performance of current detectors.
arXiv Detail & Related papers (2024-06-13T08:37:01Z) - ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection [52.228708947607636]
This paper proposes a comprehensive visual anomaly detection benchmark, textbftextitADer, which is a modular framework for new anomaly detection methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors [57.7003399760813]
We explore advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways.
We uncover a significant correlation between topics and detection performance.
These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.
arXiv Detail & Related papers (2023-12-20T10:53:53Z) - Time-series Anomaly Detection via Contextual Discriminative Contrastive
Learning [0.0]
One-class classification methods are commonly used for anomaly detection tasks.
We propose a novel approach inspired by the loss function of DeepSVDD.
We combine our approach with a deterministic contrastive loss from Neutral AD, a promising self-supervised learning anomaly detection approach.
arXiv Detail & Related papers (2023-04-16T21:36:19Z) - Explainable Contextual Anomaly Detection using Quantile Regression
Forests [14.80211278818555]
We develop connections between dependency-based traditional anomaly detection methods and contextual anomaly detection methods.
Based on resulting insights, we propose a novel approach to inherently interpretable contextual anomaly detection.
Our method outperforms state-of-the-art anomaly detection methods in terms of accuracy and interpretability.
arXiv Detail & Related papers (2023-02-22T09:39:59Z) - Explanation Method for Anomaly Detection on Mixed Numerical and
Categorical Spaces [0.9543943371833464]
We present EADMNC (Explainable Anomaly Detection on Mixed Numerical and Categorical spaces)
It adds explainability to the predictions obtained with the original model.
We report experimental results on extensive real-world data, particularly in the domain of network intrusion detection.
arXiv Detail & Related papers (2022-09-09T08:20:13Z) - Contextual Model Aggregation for Fast and Robust Federated Learning in
Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge.
Existing algorithms face issues with slow convergence and/or robustness of performance.
We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z) - Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples.
By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z) - The Sensitivity of Word Embeddings-based Author Detection Models to
Semantic-preserving Adversarial Perturbations [3.7552532139404797]
Authorship analysis is an important subject in the field of natural language processing.
This paper explores the limitations and sensitiveness of established approaches to adversarial manipulations of inputs.
arXiv Detail & Related papers (2021-02-23T19:55:45Z) - ESAD: End-to-end Deep Semi-supervised Anomaly Detection [85.81138474858197]
We propose a new objective function that measures the KL-divergence between normal and anomalous data.
The proposed method significantly outperforms several state-of-the-arts on multiple benchmark datasets.
arXiv Detail & Related papers (2020-12-09T08:16:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.