Metric Matters: A Formal Evaluation of Similarity Measures in Active Learning for Cyber Threat Intelligence
- URL: http://arxiv.org/abs/2508.19019v1
- Date: Tue, 26 Aug 2025 13:34:30 GMT
- Title: Metric Matters: A Formal Evaluation of Similarity Measures in Active Learning for Cyber Threat Intelligence
- Authors: Sidahmed Benabderrahmane, Talal Rahwan,
- Abstract summary: Advanced Persistent Threats (APTs) pose a severe challenge to cyber defense.<n>We propose a novel active learning-based anomaly detection framework.<n>Our approach uses feature-space similarity to identify normal-like and anomaly-like instances.
- Score: 1.102914654802229
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Advanced Persistent Threats (APTs) pose a severe challenge to cyber defense due to their stealthy behavior and the extreme class imbalance inherent in detection datasets. To address these issues, we propose a novel active learning-based anomaly detection framework that leverages similarity search to iteratively refine the decision space. Built upon an Attention-Based Autoencoder, our approach uses feature-space similarity to identify normal-like and anomaly-like instances, thereby enhancing model robustness with minimal oracle supervision. Crucially, we perform a formal evaluation of various similarity measures to understand their influence on sample selection and anomaly ranking effectiveness. Through experiments on diverse datasets, including DARPA Transparent Computing APT traces, we demonstrate that the choice of similarity metric significantly impacts model convergence, anomaly detection accuracy, and label efficiency. Our results offer actionable insights for selecting similarity functions in active learning pipelines tailored for threat intelligence and cyber defense.
Related papers
- Refining Decision Boundaries In Anomaly Detection Using Similarity Search Within the Feature Space [3.3202103799131795]
We introduce SDA2E, a Sparse Dual Adversarial Attention-based AutoEncoder designed to learn compact and discriminative latent representations from imbalanced, high-dimensional data.<n>We propose a similarity-guided active learning framework that integrates three novel strategies to refine decision boundaries efficiently.<n>We evaluate SDA2E extensively across 52 imbalanced datasets, including multiple DARPA Transparent Computing scenarios, and benchmark it against 15 state-of-the-art anomaly detection methods.
arXiv Detail & Related papers (2026-02-02T23:55:08Z) - GradID: Adversarial Detection via Intrinsic Dimensionality of Gradients [0.1019561860229868]
In this paper, we investigate the geometric properties of a model's input loss landscape.<n>We reveal a distinct and consistent difference in the ID for natural and adversarial data, which forms the basis of our proposed detection method.<n>Our detector significantly surpasses existing methods against a wide array of attacks, including CW and AutoAttack, achieving detection rates consistently above 92% on CIFAR-10.
arXiv Detail & Related papers (2025-12-14T20:16:03Z) - Registration is a Powerful Rotation-Invariance Learner for 3D Anomaly Detection [64.0168648353038]
3D anomaly detection in point-cloud data is critical for industrial quality control, aiming to identify structural defects with high reliability.<n>Current memory bank-based methods often suffer from inconsistent feature transformations and limited discriminative capacity.<n>We propose a registration-induced, rotation-invariant feature extraction framework that integrates the objectives of point-cloud registration and memory-based anomaly detection.
arXiv Detail & Related papers (2025-10-19T14:56:38Z) - Stochastic Encodings for Active Feature Acquisition [100.47043816019888]
Active Feature Acquisition is an instance-wise, sequential decision making problem.<n>The aim is to dynamically select which feature to measure based on current observations, independently for each test instance.<n>Common approaches either use Reinforcement Learning, which experiences training difficulties, or greedily maximize the conditional mutual information of the label and unobserved features, which makes myopic.<n>We introduce a latent variable model, trained in a supervised manner. Acquisitions are made by reasoning about the features across many possible unobserved realizations in a latent space.
arXiv Detail & Related papers (2025-08-03T23:48:46Z) - Addressing Key Challenges of Adversarial Attacks and Defenses in the Tabular Domain: A Methodological Framework for Coherence and Consistency [26.645723217188323]
Class-Specific Anomaly Detection (CSAD) is an effective novel anomaly detection approach.<n> CSAD evaluates adversarial samples relative to their predicted class distribution, rather than a broad benign distribution.<n>Our evaluation incorporates both anomaly detection rates with SHAP-based assessments to provide a more comprehensive measure of adversarial sample quality.
arXiv Detail & Related papers (2024-12-10T09:17:09Z) - Secure Hierarchical Federated Learning in Vehicular Networks Using Dynamic Client Selection and Anomaly Detection [10.177917426690701]
Hierarchical Federated Learning (HFL) faces the challenge of adversarial or unreliable vehicles in vehicular networks.
Our study introduces a novel framework that integrates dynamic vehicle selection and robust anomaly detection mechanisms.
Our proposed algorithm demonstrates remarkable resilience even under intense attack conditions.
arXiv Detail & Related papers (2024-05-25T18:31:20Z) - ADT: Agent-based Dynamic Thresholding for Anomaly Detection [4.356615197661274]
We propose an agent-based dynamic thresholding (ADT) framework based on a deep Q-network.
An auto-encoder is utilized in this study to obtain feature representations and produce anomaly scores for complex input data.
ADT can adjust thresholds adaptively by utilizing the anomaly scores from the auto-encoder.
arXiv Detail & Related papers (2023-12-03T19:07:30Z) - A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies.
Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance.
Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z) - Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications.
It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data.
We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z) - TracInAD: Measuring Influence for Anomaly Detection [0.0]
This paper proposes a novel methodology to flag anomalies based on TracIn.
We test our approach using Variational Autoencoders and show that the average influence of a subsample of training points on a test point can serve as a proxy for abnormality.
arXiv Detail & Related papers (2022-05-03T08:20:15Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Detecting Anomalies Through Contrast in Heterogeneous Data [21.56932906044264]
We propose Contrastive Learning based Heterogeneous Anomaly Detector to address shortcomings of prior models.
Our model uses an asymmetric autoencoder that can effectively handle large arity categorical variables.
We provide a qualitative study to showcase the effectiveness of our model in detecting anomalies in timber trade.
arXiv Detail & Related papers (2021-04-02T17:21:12Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.