On the Cross-Dataset Generalization of Machine Learning for Network
Intrusion Detection
- URL: http://arxiv.org/abs/2402.10974v1
- Date: Thu, 15 Feb 2024 14:39:58 GMT
- Title: On the Cross-Dataset Generalization of Machine Learning for Network
Intrusion Detection
- Authors: Marco Cantone, Claudio Marrocco, Alessandro Bria
- Abstract summary: Network Intrusion Detection Systems (NIDS) are a fundamental tool in cybersecurity.
Their ability to generalize across diverse networks is a critical factor in their effectiveness and a prerequisite for real-world applications.
In this study, we conduct a comprehensive analysis on the generalization of machine-learning-based NIDS through an extensive experimentation in a cross-dataset framework.
- Score: 50.38534263407915
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Network Intrusion Detection Systems (NIDS) are a fundamental tool in
cybersecurity. Their ability to generalize across diverse networks is a
critical factor in their effectiveness and a prerequisite for real-world
applications. In this study, we conduct a comprehensive analysis on the
generalization of machine-learning-based NIDS through an extensive
experimentation in a cross-dataset framework. We employ four machine learning
classifiers and utilize four datasets acquired from different networks:
CIC-IDS-2017, CSE-CIC-IDS2018, LycoS-IDS2017, and LycoS-Unicas-IDS2018.
Notably, the last dataset is a novel contribution, where we apply corrections
based on LycoS-IDS2017 to the well-known CSE-CIC-IDS2018 dataset. The results
show nearly perfect classification performance when the models are trained and
tested on the same dataset. However, when training and testing the models in a
cross-dataset fashion, the classification accuracy is largely commensurate with
random chance except for a few combinations of attacks and datasets. We employ
data visualization techniques in order to provide valuable insights on the
patterns in the data. Our analysis unveils the presence of anomalies in the
data that directly hinder the classifiers capability to generalize the learned
knowledge to new scenarios. This study enhances our comprehension of the
generalization capabilities of machine-learning-based NIDS, highlighting the
significance of acknowledging data heterogeneity.
Related papers
- Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Detection of Malicious Websites Using Machine Learning Techniques [0.0]
K-Nearest Neighbor is the only model that performs consistently high across datasets.
Other models such as Random Forest, Decision Trees, Logistic Regression, and Support Vector Machines also consistently outperform a baseline model of predicting every link as malicious.
arXiv Detail & Related papers (2022-09-13T13:48:31Z) - Initial Study into Application of Feature Density and
Linguistically-backed Embedding to Improve Machine Learning-based
Cyberbullying Detection [54.83707803301847]
The research was conducted on a Formspring dataset provided in a Kaggle competition on automatic cyberbullying detection.
The study confirmed the effectiveness of Neural Networks in cyberbullying detection and the correlation between classifier performance and Feature Density.
arXiv Detail & Related papers (2022-06-04T03:17:15Z) - On Generalisability of Machine Learning-based Network Intrusion
Detection Systems [0.0]
In this paper, we evaluate seven supervised and unsupervised learning models on four benchmark NIDS datasets.
Our investigation indicates that none of the considered models is able to generalise over all studied datasets.
Our investigation also indicates that overall, unsupervised learning methods generalise better than supervised learning models in our considered scenarios.
arXiv Detail & Related papers (2022-05-09T08:26:48Z) - Bridging the gap to real-world for network intrusion detection systems
with data-centric approach [1.4699455652461724]
This paper presents a systematic data-centric approach to address the current limitations of NIDS research.
It generates NIDS datasets composed of the most recent network traffic and attacks, with the labeling process integrated by design.
arXiv Detail & Related papers (2021-10-25T04:50:12Z) - Feature Extraction for Machine Learning-based Intrusion Detection in IoT
Networks [6.6147550436077776]
This paper aims to discover whether Feature Reduction (FR) and Machine Learning (ML) techniques can be generalised across various datasets.
The detection accuracy of three Feature Extraction (FE) algorithms; Principal Component Analysis (PCA), Auto-encoder (AE), and Linear Discriminant Analysis (LDA) is evaluated.
arXiv Detail & Related papers (2021-08-28T23:52:18Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Continual Learning for Recurrent Neural Networks: a Review and Empirical
Evaluation [12.27992745065497]
Continual Learning with recurrent neural networks could pave the way to a large number of applications where incoming data is non stationary.
We organize the literature on CL for sequential data processing by providing a categorization of the contributions and a review of the benchmarks.
We propose two new benchmarks for CL with sequential data based on existing datasets, whose characteristics resemble real-world applications.
arXiv Detail & Related papers (2021-03-12T19:25:28Z) - Self-Challenging Improves Cross-Domain Generalization [81.99554996975372]
Convolutional Neural Networks (CNN) conduct image classification by activating dominant features that correlated with labels.
We introduce a simple training, Self-Challenging Representation (RSC), that significantly improves the generalization of CNN to the out-of-domain data.
RSC iteratively challenges the dominant features activated on the training data, and forces the network to activate remaining features that correlates with labels.
arXiv Detail & Related papers (2020-07-05T21:42:26Z) - Novel Human-Object Interaction Detection via Adversarial Domain
Generalization [103.55143362926388]
We study the problem of novel human-object interaction (HOI) detection, aiming at improving the generalization ability of the model to unseen scenarios.
The challenge mainly stems from the large compositional space of objects and predicates, which leads to the lack of sufficient training data for all the object-predicate combinations.
We propose a unified framework of adversarial domain generalization to learn object-invariant features for predicate prediction.
arXiv Detail & Related papers (2020-05-22T22:02:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.