Revisiting Network Traffic Analysis: Compatible network flows for ML models
- URL: http://arxiv.org/abs/2511.08345v1
- Date: Wed, 12 Nov 2025 01:54:35 GMT
- Title: Revisiting Network Traffic Analysis: Compatible network flows for ML models
- Authors: João Vitorino, Daniela Pinto, Eva Maia, Ivone Amorim, Isabel Praça,
- Abstract summary: This paper studies the impact that seemingly similar features created by different network traffic flow exporters can have on the generalization and robustness of Machine Learning models.<n>To assess the usefulness of these new flows for intrusion detection, they were compared with the original versions and were used to fine-tune multiple models.
- Score: 1.7181078670359513
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To ensure that Machine Learning (ML) models can perform a robust detection and classification of cyberattacks, it is essential to train them with high-quality datasets with relevant features. However, it can be difficult to accurately represent the complex traffic patterns of an attack, especially in Internet-of-Things (IoT) networks. This paper studies the impact that seemingly similar features created by different network traffic flow exporters can have on the generalization and robustness of ML models. In addition to the original CSV files of the Bot-IoT, IoT-23, and CICIoT23 datasets, the raw network packets of their PCAP files were analysed with the HERA tool, generating new labelled flows and extracting consistent features for new CSV versions. To assess the usefulness of these new flows for intrusion detection, they were compared with the original versions and were used to fine-tune multiple models. Overall, the results indicate that directly analysing and preprocessing PCAP files, instead of just using the commonly available CSV files, enables the computation of more relevant features to train bagging and gradient boosting decision tree ensembles. It is important to continue improving feature extraction and feature selection processes to make different datasets more compatible and enable a trustworthy evaluation and comparison of the ML models used in cybersecurity solutions.
Related papers
- Every Step Counts: Decoding Trajectories as Authorship Fingerprints of dLLMs [63.82840470917859]
We show that the decoding mechanism of dLLMs can be used as a powerful tool for model attribution.<n>We propose a novel information extraction scheme called the Directed Decoding Map (DDM), which captures structural relationships between decoding steps and better reveals model-specific behaviors.
arXiv Detail & Related papers (2025-10-02T06:25:10Z) - IncepFormerNet: A multi-scale multi-head attention network for SSVEP classification [12.935583315234553]
This study proposes a new model called IncepFormerNet, which is a hybrid of the Inception and Transformer architectures.<n>IncepFormerNet adeptly extracts multi-scale temporal information from time series data using parallel convolution kernels of varying sizes.<n>It takes advantage of filter bank techniques to extract features based on the spectral characteristics of SSVEP data.
arXiv Detail & Related papers (2025-02-04T13:04:03Z) - A Novel Approach to Network Traffic Analysis: the HERA tool [0.0]
Cybersecurity threats highlight the need for robust network intrusion detection systems.<n>These systems rely heavily on datasets to train machine learning models capable of detecting patterns and predicting threats.<n> HERA is a new open-source tool that generates flow files and labelled or unlabelled datasets with user-defined features.
arXiv Detail & Related papers (2025-01-13T16:47:52Z) - Flow Exporter Impact on Intelligent Intrusion Detection Systems [0.0]
High-quality datasets are critical for training machine learning models.<n>Inconsistencies in feature generation can hinder the accuracy and reliability of threat detection.<n>This paper investigates the impact of flow exporters on the performance and reliability of machine learning models for intrusion detection.
arXiv Detail & Related papers (2024-12-18T16:38:20Z) - Scalable Weibull Graph Attention Autoencoder for Modeling Document Networks [50.42343781348247]
We develop a graph Poisson factor analysis (GPFA) which provides analytic conditional posteriors to improve the inference accuracy.
We also extend GPFA to a multi-stochastic-layer version named graph Poisson gamma belief network (GPGBN) to capture the hierarchical document relationships at multiple semantic levels.
Our models can extract high-quality hierarchical latent document representations and achieve promising performance on various graph analytic tasks.
arXiv Detail & Related papers (2024-10-13T02:22:14Z) - Reliable Feature Selection for Adversarially Robust Cyber-Attack Detection [0.0]
This work presents a feature selection and consensus process that combines multiple methods and applies them to several network datasets.
By using an improved dataset with more data diversity, selecting the best time-related features and a more specific feature set, and performing adversarial training, the ML models were able to achieve a better adversarially robust generalization.
arXiv Detail & Related papers (2024-04-05T16:01:21Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Machine Learning for QoS Prediction in Vehicular Communication:
Challenges and Solution Approaches [46.52224306624461]
We consider maximum throughput prediction enhancing, for example, streaming or high-definition mapping applications.
We highlight how confidence can be built on machine learning technologies by better understanding the underlying characteristics of the collected data.
We use explainable AI to show that machine learning can learn underlying principles of wireless networks without being explicitly programmed.
arXiv Detail & Related papers (2023-02-23T12:29:20Z) - Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution
Detection [55.028065567756066]
Out-of-distribution (OOD) detection has recently received much attention from the machine learning community due to its importance in deploying machine learning models in real-world applications.
In this paper we propose an uncertainty quantification approach by modelling the distribution of features.
We incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble neural networks (BE-SNNs) and overcome the feature collapse problem.
We show that BE-SNNs yield superior performance on several OOD benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, FashionM
arXiv Detail & Related papers (2022-06-26T16:00:22Z) - How Well Do Sparse Imagenet Models Transfer? [75.98123173154605]
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" datasets.
In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset.
We show that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities.
arXiv Detail & Related papers (2021-11-26T11:58:51Z) - An Explainable Machine Learning-based Network Intrusion Detection System
for Enabling Generalisability in Securing IoT Networks [0.0]
Machine Learning (ML)-based network intrusion detection systems bring many benefits for enhancing the security posture of an organisation.
Many systems have been designed and developed in the research community, often achieving a perfect detection rate when evaluated using certain datasets.
This paper tightens the gap by evaluating the generalisability of a common feature set to different network environments and attack types.
arXiv Detail & Related papers (2021-04-15T00:44:45Z) - On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time.
We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.