A Pipeline of Augmentation and Sequence Embedding for Classification of Imbalanced Network Traffic
- URL: http://arxiv.org/abs/2502.18909v1
- Date: Wed, 26 Feb 2025 07:55:24 GMT
- Title: A Pipeline of Augmentation and Sequence Embedding for Classification of Imbalanced Network Traffic
- Authors: Matin Shokri, Ramin Hasibi,
- Abstract summary: We propose a pipeline to balance the dataset and classify it using a robust and accurate embedding technique.<n>We demonstrate that the proposed augmentation pipeline, combined with FS-Embedding, increases convergence speed and leads to a significant reduction in the number of model parameters.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Network Traffic Classification (NTC) is one of the most important tasks in network management. The imbalanced nature of classes on the internet presents a critical challenge in classification tasks. For example, some classes of applications are much more prevalent than others, such as HTTP. As a result, machine learning classification models do not perform well on those classes with fewer data. To address this problem, we propose a pipeline to balance the dataset and classify it using a robust and accurate embedding technique. First, we generate artificial data using Long Short-Term Memory (LSTM) networks and Kernel Density Estimation (KDE). Next, we propose replacing one-hot encoding for categorical features with a novel embedding framework based on the "Flow as a Sentence" perspective, which we name FS-Embedding. This framework treats the source and destination ports, along with the packet's direction, as one word in a flow, then trains an embedding vector space based on these new features through the learning classification task. Finally, we compare our pipeline with the training of a Convolutional Recurrent Neural Network (CRNN) and Transformers, both with imbalanced and sampled datasets, as well as with the one-hot encoding approach. We demonstrate that the proposed augmentation pipeline, combined with FS-Embedding, increases convergence speed and leads to a significant reduction in the number of model parameters, all while maintaining the same performance in terms of accuracy.
Related papers
- TensAIR: Real-Time Training of Neural Networks from Data-streams [1.409180142531996]
This paper presents TensAIR, the first OL system for training ANNs in real time.
TensAIR achieves remarkable performance and scalability by using a decentralized and asynchronous architecture to train ANN models.
We empirically demonstrate that TensAIR achieves a nearly linear scale-out performance in terms of (1) the number of worker nodes deployed in the network, and (2) the throughput at which the data batches arrive.
arXiv Detail & Related papers (2022-11-18T15:11:44Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Pushing the Limits of Simple Pipelines for Few-Shot Learning: External
Data and Fine-Tuning Make a Difference [74.80730361332711]
Few-shot learning is an important and topical problem in computer vision.
We show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-15T02:55:58Z) - Towards Disentangling Information Paths with Coded ResNeXt [11.884259630414515]
We take a novel approach to enhance the transparency of the function of the whole network.
We propose a neural network architecture for classification, in which the information that is relevant to each class flows through specific paths.
arXiv Detail & Related papers (2022-02-10T21:45:49Z) - Multi-Task Classification of Sewer Pipe Defects and Properties using a
Cross-Task Graph Neural Network Decoder [56.673599764041384]
We present a novel decoder-focused multi-task classification architecture Cross-Task Graph Neural Network (CT-GNN)
CT-GNN refines the disjointed per-task predictions using cross-task information.
We achieve state-of-the-art performance on all four classification tasks in the Sewer-ML dataset.
arXiv Detail & Related papers (2021-11-15T15:36:50Z) - PDFNet: Pointwise Dense Flow Network for Urban-Scene Segmentation [0.0]
We propose a novel lightweight architecture named point-wise dense flow network (PDFNet)
In PDFNet, we employ dense, residual, and multiple shortcut connections to allow a smooth gradient flow to all parts of the network.
Our method significantly outperforms baselines in capturing small classes and in few-data regimes.
arXiv Detail & Related papers (2021-09-21T10:39:46Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - PHEW: Constructing Sparse Networks that Learn Fast and Generalize Well
without Training Data [10.01323660393278]
We show how to design sparse neural networks for faster convergence, without any training data, using the Synflow-L2 algorithm.
We propose a new method to construct sparse networks, without any training data, referred to as Paths with Higher-Edge Weights (PHEW)
arXiv Detail & Related papers (2020-10-22T00:20:59Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - File Classification Based on Spiking Neural Networks [0.5065947993017157]
We propose a system for file classification in large data sets based on spiking neural networks (SNNs)
The proposed system may represent a valid alternative to classical machine learning algorithms for inference tasks.
arXiv Detail & Related papers (2020-04-08T11:50:29Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Semantic Drift Compensation for Class-Incremental Learning [48.749630494026086]
Class-incremental learning of deep networks sequentially increases the number of classes to be classified.
We propose a new method to estimate the drift, called semantic drift, of features and compensate for it without the need of any exemplars.
arXiv Detail & Related papers (2020-04-01T13:31:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.