NetBench: A Large-Scale and Comprehensive Network Traffic Benchmark Dataset for Foundation Models
- URL: http://arxiv.org/abs/2403.10319v2
- Date: Tue, 19 Mar 2024 03:36:53 GMT
- Title: NetBench: A Large-Scale and Comprehensive Network Traffic Benchmark Dataset for Foundation Models
- Authors: Chen Qian, Xiaochang Li, Qineng Wang, Gang Zhou, Huajie Shao,
- Abstract summary: In computer networking, network traffic refers to the amount of data transmitted in the form of packets between internetworked computers or Cyber-Physical Systems.
We introduce the NetBench, a large-scale and comprehensive benchmark dataset for assessing machine learning models, especially foundation models, in both network traffic classification and generation tasks.
- Score: 15.452625276982987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In computer networking, network traffic refers to the amount of data transmitted in the form of packets between internetworked computers or Cyber-Physical Systems. Monitoring and analyzing network traffic is crucial for ensuring the performance, security, and reliability of a network. However, a significant challenge in network traffic analysis is to process diverse data packets including both ciphertext and plaintext. While many methods have been adopted to analyze network traffic, they often rely on different datasets for performance evaluation. This inconsistency results in substantial manual data processing efforts and unfair comparisons. Moreover, some data processing methods may cause data leakage due to improper separation of training and testing data. To address these issues, we introduce the NetBench, a large-scale and comprehensive benchmark dataset for assessing machine learning models, especially foundation models, in both network traffic classification and generation tasks. NetBench is built upon seven publicly available datasets and encompasses a broad spectrum of 20 tasks, including 15 classification tasks and 5 generation tasks. Furthermore, we evaluate eight State-Of-The-Art (SOTA) classification models (including two foundation models) and two generative models using our benchmark. The results show that foundation models significantly outperform the traditional deep learning methods in traffic classification. We believe NetBench will facilitate fair comparisons among various approaches and advance the development of foundation models for network traffic. Our benchmark is available at https://github.com/WM-JayLab/NetBench.
Related papers
- Lens: A Foundation Model for Network Traffic [19.3652490585798]
Lens is a foundation model for network traffic that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data.
We design a novel loss that combines three distinct tasks: Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP)
arXiv Detail & Related papers (2024-02-06T02:45:13Z) - Data Filtering Networks [67.827994353269]
We study the problem of learning a data filtering network (DFN) for this second step of filtering a large uncurated dataset.
Our key finding is that the quality of a network for filtering is distinct from its performance on downstream tasks.
Based on our insights, we construct new data filtering networks that induce state-of-the-art image-text datasets.
arXiv Detail & Related papers (2023-09-29T17:37:29Z) - NetDiffus: Network Traffic Generation by Diffusion Models through
Time-Series Imaging [3.208802773440937]
We develop an end-to-end framework - NetDiffus that converts one-dimensional time-series network traffic into two-dimensional images, and then synthesizes representative images for the original data.
We demonstrate that NetDiffus outperforms the state-of-the-art traffic generation methods based on Generative Adversarial Networks (GANs) by providing 66.4% increase in fidelity of the generated data and 18.1% increase in downstream machine learning tasks.
arXiv Detail & Related papers (2023-09-23T18:13:12Z) - NetGPT: Generative Pretrained Transformer for Network Traffic [4.205009931131087]
Pretrained models for network traffic can utilize large-scale raw data to learn the essential characteristics of network traffic.
In this paper, we make the first attempt to provide a generative pretrained model NetGPT for both traffic understanding and generation tasks.
arXiv Detail & Related papers (2023-04-19T09:04:30Z) - RouteNet-Fermi: Network Modeling with Graph Neural Networks [7.227467283378366]
We present RouteNet-Fermi, a custom Graph Neural Networks (GNN) model that shares the same goals as Queuing Theory.
The proposed model predicts accurately the delay, jitter, and packet loss of a network.
Our experimental results show that RouteNet-Fermi achieves similar accuracy as computationally-expensive packet-level simulators.
arXiv Detail & Related papers (2022-12-22T23:02:40Z) - Unsupervised Domain-adaptive Hash for Networks [81.49184987430333]
Domain-adaptive hash learning has enjoyed considerable success in the computer vision community.
We develop an unsupervised domain-adaptive hash learning method for networks, dubbed UDAH.
arXiv Detail & Related papers (2021-08-20T12:09:38Z) - Temporal Graph Network Embedding with Causal Anonymous Walks
Representations [54.05212871508062]
We propose a novel approach for dynamic network representation learning based on Temporal Graph Network.
For evaluation, we provide a benchmark pipeline for the evaluation of temporal network embeddings.
We show the applicability and superior performance of our model in the real-world downstream graph machine learning task provided by one of the top European banks.
arXiv Detail & Related papers (2021-08-19T15:39:52Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - Graph-Based Neural Network Models with Multiple Self-Supervised
Auxiliary Tasks [79.28094304325116]
Graph Convolutional Networks are among the most promising approaches for capturing relationships among structured data points.
We propose three novel self-supervised auxiliary tasks to train graph-based neural network models in a multi-task fashion.
arXiv Detail & Related papers (2020-11-14T11:09:51Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - Inferring Network Structure From Data [1.2437226707039446]
We propose a network model selection methodology that focuses on evaluating a network's utility for varying tasks.
We demonstrate that this network definition matters in several ways for modeling the behavior of the underlying system.
arXiv Detail & Related papers (2020-04-04T23:30:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.