Related papers: User-Based Sequential Modeling with Transformer Encoders for Insider Threat Detection

User-Based Sequential Modeling with Transformer Encoders for Insider Threat Detection

URL: http://arxiv.org/abs/2506.23446v2
Date: Thu, 10 Jul 2025 02:22:22 GMT
Title: User-Based Sequential Modeling with Transformer Encoders for Insider Threat Detection
Authors: Mohamed Elbasheer, Adewale Akinfaderin,
Abstract summary: Insider threat detection presents unique challenges due to the authorized status of malicious actors.<n>Existing machine learning methods treat user activity as isolated events, thereby failing to leverage sequential dependencies in user behavior.<n>We propose a User-Based Sequencing (UBS) methodology, transforming the CERT insider threat dataset into structured temporal sequences suitable for deep sequential modeling.
Score: 0.005755004576310333
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Insider threat detection presents unique challenges due to the authorized status of malicious actors and the subtlety of anomalous behaviors. Existing machine learning methods often treat user activity as isolated events, thereby failing to leverage sequential dependencies in user behavior. In this study, we propose a User-Based Sequencing (UBS) methodology, transforming the CERT insider threat dataset into structured temporal sequences suitable for deep sequential modeling. We deploy a Transformer Encoder architecture to model benign user activity and employ its reconstruction errors as anomaly scores. These scores are subsequently evaluated using three unsupervised outlier detection algorithms: One-Class SVM (OCSVM), Local Outlier Factor (LOF), and Isolation Forest (iForest). Across four rigorously designed test sets, including combinations of multiple CERT dataset releases, our UBS-Transformer pipeline consistently achieves state-of-the-art performance - notably 96.61% accuracy, 99.43% recall, 96.38% F1-score, 95.00% AUROC, and exceptionally low false negative (0.0057) and false positive (0.0571) rates. Comparative analyses demonstrate that our approach substantially outperforms tabular and conventional autoencoder baselines, underscoring the efficacy of sequential user modeling and advanced anomaly detection in the insider threat domain.

Related papers

Component-aware Unsupervised Logical Anomaly Generation for Industrial Anomaly Detection [31.27483219228598]
Anomaly detection is critical in industrial manufacturing for ensuring product quality and improving efficiency in automated processes.<n>Recent generative models often produce unrealistic anomalies increasing false positives, or require real-world anomaly samples for training.<n>We propose ComGEN, a component-aware and unsupervised framework that addresses the gap in logical anomaly generation.
arXiv Detail & Related papers (2025-02-17T11:54:43Z)
APT-LLM: Embedding-Based Anomaly Detection of Cyber Advanced Persistent Threats Using Large Language Models [4.956245032674048]
APTs pose a major cybersecurity challenge due to their stealth and ability to mimic normal system behavior.<n>This paper introduces APT-LLM, a novel embedding-based anomaly detection framework.<n>It integrates large language models (LLMs) with autoencoder architectures to detect APTs.
arXiv Detail & Related papers (2025-02-13T15:01:18Z)
Secure Hierarchical Federated Learning in Vehicular Networks Using Dynamic Client Selection and Anomaly Detection [10.177917426690701]
Hierarchical Federated Learning (HFL) faces the challenge of adversarial or unreliable vehicles in vehicular networks. Our study introduces a novel framework that integrates dynamic vehicle selection and robust anomaly detection mechanisms. Our proposed algorithm demonstrates remarkable resilience even under intense attack conditions.
arXiv Detail & Related papers (2024-05-25T18:31:20Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
ADT: Agent-based Dynamic Thresholding for Anomaly Detection [4.356615197661274]
We propose an agent-based dynamic thresholding (ADT) framework based on a deep Q-network. An auto-encoder is utilized in this study to obtain feature representations and produce anomaly scores for complex input data. ADT can adjust thresholds adaptively by utilizing the anomaly scores from the auto-encoder.
arXiv Detail & Related papers (2023-12-03T19:07:30Z)
Unraveling the "Anomaly" in Time Series Anomaly Detection: A Self-supervised Tri-domain Solution [89.16750999704969]
Anomaly labels hinder traditional supervised models in time series anomaly detection. Various SOTA deep learning techniques, such as self-supervised learning, have been introduced to tackle this issue. We propose a novel self-supervised learning based Tri-domain Anomaly Detector (TriAD)
arXiv Detail & Related papers (2023-11-19T05:37:18Z)
AnomalyBERT: Self-Supervised Transformer for Time Series Anomaly Detection using Data Degradation Scheme [0.7216399430290167]
Anomaly detection task for time series, especially for unlabeled data, has been a challenging problem. We address it by applying a suitable data degradation scheme to self-supervised model training. Inspired by the self-attention mechanism, we design a Transformer-based architecture to recognize the temporal context.
arXiv Detail & Related papers (2023-05-08T05:42:24Z)
PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows. Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z)
Unsupervised User-Based Insider Threat Detection Using Bayesian Gaussian Mixture Models [0.0]
In this paper, we propose an unsupervised insider threat detection system based on audit data. The proposed approach leverages a user-based model to optimize specific behaviors modelization and an automatic feature extraction system based on Word2Vec. Results indicate that the proposed method competes with state-of-the-art approaches, presenting a good recall of 88%, accuracy and true negative rate of 93%, and a false positive rate of 6.9%.
arXiv Detail & Related papers (2022-11-23T13:45:19Z)
Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)
Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on. We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z)
TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs) To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics. To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z)
Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction. We put forward an alternative measure of anomaly score to replace the reconstruction-based metric. Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.