Benchmarking the Benchmark -- Analysis of Synthetic NIDS Datasets
- URL: http://arxiv.org/abs/2104.09029v1
- Date: Mon, 19 Apr 2021 03:17:37 GMT
- Title: Benchmarking the Benchmark -- Analysis of Synthetic NIDS Datasets
- Authors: Siamak Layeghy, Marcus Gallagher, Marius Portmann
- Abstract summary: We analyse the statistical properties of benign traffic in three of the more recent and relevant NIDS datasets.
Our results show a distinct difference of most of the considered statistical features between the synthetic datasets and two real-world datasets.
- Score: 4.125187280299247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Network Intrusion Detection Systems (NIDSs) are an increasingly important
tool for the prevention and mitigation of cyber attacks. A number of labelled
synthetic datasets generated have been generated and made publicly available by
researchers, and they have become the benchmarks via which new ML-based NIDS
classifiers are being evaluated. Recently published results show excellent
classification performance with these datasets, increasingly approaching 100
percent performance across key evaluation metrics such as accuracy, F1 score,
etc. Unfortunately, we have not yet seen these excellent academic research
results translated into practical NIDS systems with such near-perfect
performance. This motivated our research presented in this paper, where we
analyse the statistical properties of the benign traffic in three of the more
recent and relevant NIDS datasets, (CIC, UNSW, ...). As a comparison, we
consider two datasets obtained from real-world production networks, one from a
university network and one from a medium size Internet Service Provider (ISP).
Our results show that the two real-world datasets are quite similar among
themselves in regards to most of the considered statistical features. Equally,
the three synthetic datasets are also relatively similar within their group.
However, and most importantly, our results show a distinct difference of most
of the considered statistical features between the three synthetic datasets and
the two real-world datasets. Since ML relies on the basic assumption of
training and test datasets being sampled from the same distribution, this
raises the question of how well the performance results of ML-classifiers
trained on the considered synthetic datasets can translate and generalise to
real-world networks. We believe this is an interesting and relevant question
which provides motivation for further research in this space.
Related papers
- EBES: Easy Benchmarking for Event Sequences [17.277513178760348]
Event sequences are common data structures in various real-world domains such as healthcare, finance, and user interaction logs.
Despite advances in temporal data modeling techniques, there is no standardized benchmarks for evaluating their performance on event sequences.
We introduce EBES, a comprehensive benchmarking tool with standardized evaluation scenarios and protocols.
arXiv Detail & Related papers (2024-10-04T13:03:43Z) - Efficacy of Synthetic Data as a Benchmark [3.2968976262860408]
We investigate the effectiveness of generating synthetic data through large language models (LLMs)
Our experiments show that while synthetic data can effectively capture performance of various methods for simpler tasks, it falls short for more complex tasks like named entity recognition.
We propose a new metric called the bias factor, which evaluates the biases introduced when the same LLM is used to both generate benchmarking data and to perform the tasks.
arXiv Detail & Related papers (2024-09-18T13:20:23Z) - Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs [7.407592553310068]
We propose an empirical protocol based on a fair benchmarking framework to investigate the performance discrepancy between simple methods and GNNs.
We also propose a novel metric to quantify the dataset effectiveness by considering both dataset complexity and model performance.
Our findings shed light on the current understanding of benchmark datasets, and our new platform could fuel the future evolution of graph classification benchmarks.
arXiv Detail & Related papers (2024-07-06T08:33:23Z) - On the Cross-Dataset Generalization of Machine Learning for Network
Intrusion Detection [50.38534263407915]
Network Intrusion Detection Systems (NIDS) are a fundamental tool in cybersecurity.
Their ability to generalize across diverse networks is a critical factor in their effectiveness and a prerequisite for real-world applications.
In this study, we conduct a comprehensive analysis on the generalization of machine-learning-based NIDS through an extensive experimentation in a cross-dataset framework.
arXiv Detail & Related papers (2024-02-15T14:39:58Z) - Reliability in Semantic Segmentation: Can We Use Synthetic Data? [69.28268603137546]
We show for the first time how synthetic data can be specifically generated to assess comprehensively the real-world reliability of semantic segmentation models.
This synthetic data is employed to evaluate the robustness of pretrained segmenters.
We demonstrate how our approach can be utilized to enhance the calibration and OOD detection capabilities of segmenters.
arXiv Detail & Related papers (2023-12-14T18:56:07Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Is Synthetic Dataset Reliable for Benchmarking Generalizable Person
Re-Identification? [1.1041211464412568]
We show that a recent large-scale synthetic dataset ClonedPerson can be reliably used to benchmark GPReID, statistically the same as real-world datasets.
This study guarantees the usage of synthetic datasets for both source training set and target testing set, with completely no privacy concerns from real-world surveillance data.
arXiv Detail & Related papers (2022-09-12T06:54:54Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - Feature Extraction for Machine Learning-based Intrusion Detection in IoT
Networks [6.6147550436077776]
This paper aims to discover whether Feature Reduction (FR) and Machine Learning (ML) techniques can be generalised across various datasets.
The detection accuracy of three Feature Extraction (FE) algorithms; Principal Component Analysis (PCA), Auto-encoder (AE), and Linear Discriminant Analysis (LDA) is evaluated.
arXiv Detail & Related papers (2021-08-28T23:52:18Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.