Related papers: Structure-based Anomaly Detection and Clustering

Structure-based Anomaly Detection and Clustering

URL: http://arxiv.org/abs/2505.12751v1
Date: Mon, 19 May 2025 06:20:00 GMT
Title: Structure-based Anomaly Detection and Clustering
Authors: Filippo Leveni,
Abstract summary: Anomaly detection is a fundamental problem in domains such as healthcare, manufacturing, and cybersecurity.<n>This thesis proposes new unsupervised methods for anomaly detection in both structured and streaming data settings.
Score: 1.450405446885067
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Anomaly detection is a fundamental problem in domains such as healthcare, manufacturing, and cybersecurity. This thesis proposes new unsupervised methods for anomaly detection in both structured and streaming data settings. In the first part, we focus on structure-based anomaly detection, where normal data follows low-dimensional manifolds while anomalies deviate from them. We introduce Preference Isolation Forest (PIF), which embeds data into a high-dimensional preference space via manifold fitting, and isolates outliers using two variants: Voronoi-iForest, based on geometric distances, and RuzHash-iForest, leveraging Locality Sensitive Hashing for scalability. We also propose Sliding-PIF, which captures local manifold information for streaming scenarios. Our methods outperform existing techniques on synthetic and real datasets. We extend this to structure-based clustering with MultiLink, a novel method for recovering multiple geometric model families in noisy data. MultiLink merges clusters via a model-aware linkage strategy, enabling robust multi-class structure recovery. It offers key advantages over existing approaches, such as speed, reduced sensitivity to thresholds, and improved robustness to poor initial sampling. The second part of the thesis addresses online anomaly detection in evolving data streams. We propose Online Isolation Forest (Online-iForest), which uses adaptive, multi-resolution histograms and dynamically updates tree structures to track changes over time. It avoids retraining while achieving accuracy comparable to offline models, with superior efficiency for real-time applications. Finally, we tackle anomaly detection in cybersecurity via open-set recognition for malware classification. We enhance a Gradient Boosting classifier with MaxLogit to detect unseen malware families, a method now integrated into Cleafy's production system.

Related papers

Refining Decision Boundaries In Anomaly Detection Using Similarity Search Within the Feature Space [3.3202103799131795]
We introduce SDA2E, a Sparse Dual Adversarial Attention-based AutoEncoder designed to learn compact and discriminative latent representations from imbalanced, high-dimensional data.<n>We propose a similarity-guided active learning framework that integrates three novel strategies to refine decision boundaries efficiently.<n>We evaluate SDA2E extensively across 52 imbalanced datasets, including multiple DARPA Transparent Computing scenarios, and benchmark it against 15 state-of-the-art anomaly detection methods.
arXiv Detail & Related papers (2026-02-02T23:55:08Z)
Localized Kernel Projection Outlyingness: A Two-Stage Approach for Multi-Modal Outlier Detection [0.0]
Two-Stage LKPLO is a novel multi-stage outlier detection framework.<n>It overcomes the coexisting limitations of conventional projection-based methods.<n>It achieves state-of-the-art performance on challenging datasets.
arXiv Detail & Related papers (2025-10-28T03:53:46Z)
CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection [54.85000884785013]
Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types, and the scarcity of training data.<n>We propose CLIPfusion, a method that leverages both discriminative and generative foundation models.<n>We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection.
arXiv Detail & Related papers (2025-06-13T13:30:15Z)
A Dataset for Semantic Segmentation in the Presence of Unknowns [49.795683850385956]
Existing datasets allow evaluation of only knowns or unknowns - but not both.<n>We propose a novel anomaly segmentation dataset, ISSU, that features a diverse set of anomaly inputs from cluttered real-world environments.<n>The dataset is twice larger than existing anomaly segmentation datasets.
arXiv Detail & Related papers (2025-03-28T10:31:01Z)
Enhancing Web Service Anomaly Detection via Fine-grained Multi-modal Association and Frequency Domain Analysis [8.860339665670255]
Anomaly detection is crucial for ensuring the stability and reliability of web service systems.<n>Existing anomaly detection methods use logs and metrics to detect anomalies.<n>We propose a novel anomaly detection method named FFAD to address these two issues.
arXiv Detail & Related papers (2025-01-28T12:00:45Z)
PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a. Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns. We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z)
Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations. In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z)
Anomaly Detection with Ensemble of Encoder and Decoder [2.8199078343161266]
Anomaly detection in power grids aims to detect and discriminate anomalies caused by cyber attacks against the power system. We propose a novel anomaly detection method by modeling the data distribution of normal samples via multiple encoders and decoders. Experiment results on network intrusion and power system datasets demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-03-11T15:49:29Z)
BSSAD: Towards A Novel Bayesian State-Space Approach for Anomaly Detection in Multivariate Time Series [0.0]
We propose a novel and innovative approach to anomaly detection called Bayesian State-Space Anomaly Detection(BSSAD) The design of our approach combines the strength of Bayesian state-space algorithms in predicting the next state and the effectiveness of recurrent neural networks and autoencoders. In particular, we focus on using Bayesian state-space models of particle filters and ensemble Kalman filters.
arXiv Detail & Related papers (2023-01-30T16:21:18Z)
Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)
Hybridization of Capsule and LSTM Networks for unsupervised anomaly detection on multivariate data [0.0]
This paper introduces a novel NN architecture which hybridises the Long-Short-Term-Memory (LSTM) and Capsule Networks into a single network. The proposed method uses an unsupervised learning technique to overcome the issues with finding large volumes of labelled training data.
arXiv Detail & Related papers (2022-02-11T10:33:53Z)
TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs) To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics. To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z)
Contextual-Bandit Anomaly Detection for IoT Data in Distributed Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay. We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems. We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.