Reproducibility in Event-Log Research: A Parametrised Generator and Benchmark for Event-based Signatures
- URL: http://arxiv.org/abs/2601.12978v1
- Date: Mon, 19 Jan 2026 11:39:39 GMT
- Title: Reproducibility in Event-Log Research: A Parametrised Generator and Benchmark for Event-based Signatures
- Authors: Saad Khan, Simon Parkinson, Monika Roopak,
- Abstract summary: Event-based datasets are crucial for cybersecurity analysis.<n>Key use case is detecting event-based signatures, which represent attacks spanning multiple events.<n>We present a novel parametrised generation technique capable of producing synthetic event datasets.
- Score: 2.024255109998051
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Event-based datasets are crucial for cybersecurity analysis. A key use case is detecting event-based signatures, which represent attacks spanning multiple events and can only be understood once the relevant events are identified and linked. Analysing event datasets is essential for monitoring system security, but their growing volume and frequency create significant scalability and processing difficulties. Researchers rely on these datasets to develop and test techniques for automatically identifying signatures. However, because real datasets are security-sensitive and rarely shared, it becomes difficult to perform meaningful comparative evaluation between different approaches. This work addresses this evaluation limitation by offering a systematic method for generating event logs with known ground truth, enabling reproducible and comparable research. We present a novel parametrised generation technique capable of producing synthetic event datasets that contain event-based signatures for discovery. To demonstrate the capabilities of the technique, we provide a benchmark in signature detection. Our benchmarking demonstrated the suitability of DBSCAN, achieving a score greater than 0.95 Adjusted Rand Index on most generated datasets. This work enhances the ability of researchers to develop and benchmark new cybersecurity techniques, ultimately contributing to more robust and effective cybersecurity measures.
Related papers
- Comparative Evaluation of VAE, GAN, and SMOTE for Tor Detection in Encrypted Network Traffic [0.0]
Encrypted network traffic poses significant challenges for intrusion detection.<n>Traditional data augmentation methods struggle to preserve the complex temporal and statistical characteristics of real network traffic.<n>This work explores the use of Generative AI (GAI) models to synthesize realistic and diverse encrypted traffic traces.
arXiv Detail & Related papers (2026-01-03T13:31:53Z) - Cross-Dataset Semantic Segmentation Performance Analysis: Unifying NIST Point Cloud City Datasets for 3D Deep Learning [49.1574468325115]
This study analyzes semantic segmentation performance across heterogeneously labeled point-cloud datasets relevant to public safety applications.<n>Key identified challenges include insufficient labeled data, difficulties in unifying class labels across datasets, and the need for standardization.
arXiv Detail & Related papers (2025-08-01T17:59:02Z) - ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction [57.930531826380836]
This work explores whether a foundational segmentation model can address label scarcity in the pixel-level vision task as an annotator for unlabeled images.<n>We propose ConformalSAM, a novel SSSS framework which first calibrates the foundation model using the target domain's labeled data and then filters out unreliable pixel labels of unlabeled data.
arXiv Detail & Related papers (2025-07-21T17:02:57Z) - EBES: Easy Benchmarking for Event Sequences [17.277513178760348]
Event Sequences (EvS) refer to sequential data characterized by irregular sampling intervals and a mix of categorical and numerical features.<n>EBES is a comprehensive benchmark for EvS classification with sequence-level targets.<n>It features standardized evaluation scenarios and protocols, along with an open-source PyTorch library that implements 9 modern models.
arXiv Detail & Related papers (2024-10-04T13:03:43Z) - Synthetic-To-Real Video Person Re-ID [57.937189569211505]
Person re-identification (Re-ID) is an important task and has significant applications for public security and information forensics.<n>We investigate a novel and challenging setting of Re-ID, i.e., cross-domain video-based person Re-ID.<n>We utilize synthetic video datasets as the source domain for training and real-world videos for testing.
arXiv Detail & Related papers (2024-02-03T10:19:21Z) - Empowering HWNs with Efficient Data Labeling: A Clustered Federated
Semi-Supervised Learning Approach [2.046985601687158]
Clustered Federated Multitask Learning (CFL) has gained considerable attention as an effective strategy for overcoming statistical challenges.
We introduce a novel framework, Clustered Federated Semi-Supervised Learning (CFSL), designed for more realistic HWN scenarios.
Our results demonstrate that CFSL significantly improves upon key metrics such as testing accuracy, labeling accuracy, and labeling latency under varying proportions of labeled and unlabeled data.
arXiv Detail & Related papers (2024-01-19T11:47:49Z) - A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies.
Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance.
Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z) - A Semi-Supervised Approach for Power System Event Identification [12.862865254507179]
We propose a novel semi-supervised framework to assess the effectiveness of incorporating unlabeled eventful samples to enhance existing event identification methodologies.
Our approach characterizes events using physically interpretable features extracted from modal analysis of synthetic eventful PMU data.
We have developed and publicly shared a comprehensive Event Identification package which consists of three aspects: data generation, feature extraction, and event identification with limited labels.
arXiv Detail & Related papers (2023-09-18T19:07:41Z) - Benchmarking and Analyzing Generative Data for Visual Recognition [95.69499648941196]
This work delves into the impact of generative images, primarily comparing paradigms that harness external data.<n>We devise textbfGenBench, a benchmark comprising 22 datasets with 2548 categories, to appraise generative data across various visual recognition tasks.<n>Our exhaustive benchmark and analysis spotlight generative data's promise in visual recognition, while identifying key challenges for future investigation.
arXiv Detail & Related papers (2023-07-25T17:59:59Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Alignment-based conformance checking over probabilistic events [4.060731229044571]
We introduce a weighted trace model and weighted alignment cost function, and a custom threshold parameter that controls the level of confidence on the event data.
The resulting algorithm considers activities of lower but sufficiently high probability that better align with the process model.
arXiv Detail & Related papers (2022-09-09T14:07:37Z) - Gram-SLD: Automatic Self-labeling and Detection for Instance Objects [6.512856940779818]
We propose a new framework based on co-training called Gram Self-Labeling and Detection (Gram-SLD)
Gram-SLD can automatically annotate a large amount of data with very limited manually labeled key data and achieve competitive performance.
arXiv Detail & Related papers (2021-12-07T11:34:55Z) - Robust Event Classification Using Imperfect Real-world PMU Data [58.26737360525643]
We study robust event classification using imperfect real-world phasor measurement unit (PMU) data.
We develop a novel machine learning framework for training robust event classifiers.
arXiv Detail & Related papers (2021-10-19T17:41:43Z) - A Comprehensive Guide to CAN IDS Data & Introduction of the ROAD Dataset [1.6494191187996927]
Controller Area Networks (CANs) lack basic security properties and are easily exploitable.
producing vehicular CAN data with a variety of intrusions is out of reach for most researchers.
We present the first comprehensive guide to the existing open CAN intrusion datasets.
arXiv Detail & Related papers (2020-12-29T04:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.