Related papers: ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification

ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification

URL: http://arxiv.org/abs/2202.06335v1
Date: Sun, 13 Feb 2022 14:54:48 GMT
Title: ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification
Authors: Xinjie Lin, Gang Xiong, Gaopeng Gou, Zhen Li, Junzheng Shi, Jing Yu
Abstract summary: We propose a new traffic representation model called Encrypted Traffic Bidirectional Representations from Transformer (ET-BERT) The pre-trained model can be fine-tuned on a small number of task-specific labeled data and achieves state-of-the-art performance across five encrypted traffic classification tasks.
Score: 9.180725486824118
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Encrypted traffic classification requires discriminative and robust traffic representation captured from content-invisible and imbalanced traffic data for accurate classification, which is challenging but indispensable to achieve network security and network management. The major limitation of existing solutions is that they highly rely on the deep features, which are overly dependent on data size and hard to generalize on unseen data. How to leverage the open-domain unlabeled traffic data to learn representation with strong generalization ability remains a key challenge. In this paper,we propose a new traffic representation model called Encrypted Traffic Bidirectional Encoder Representations from Transformer (ET-BERT), which pre-trains deep contextualized datagram-level representation from large-scale unlabeled data. The pre-trained model can be fine-tuned on a small number of task-specific labeled data and achieves state-of-the-art performance across five encrypted traffic classification tasks, remarkably pushing the F1 of ISCX-Tor to 99.2% (4.4% absolute improvement), ISCX-VPN-Service to 98.9% (5.2% absolute improvement), Cross-Platform (Android) to 92.5% (5.4% absolute improvement), CSTNET-TLS 1.3 to 97.4% (10.0% absolute improvement). Notably, we provide explanation of the empirically powerful pre-training model by analyzing the randomness of ciphers. It gives us insights in understanding the boundary of classification ability over encrypted traffic. The code is available at: https://github.com/linwhitehat/ET-BERT.

Related papers

NetFlowGen: Leveraging Generative Pre-training for Network Traffic Dynamics [72.95483148058378]
We propose to pre-train a general-purpose machine learning model to capture traffic dynamics with only traffic data from NetFlow records. We address challenges such as unifying network feature representations, learning from large unlabeled traffic data volume, and testing on real downstream tasks in DDoS attack detection.
arXiv Detail & Related papers (2024-12-30T00:47:49Z)
MIETT: Multi-Instance Encrypted Traffic Transformer for Encrypted Traffic Classification [59.96233305733875]
Classifying traffic is essential for detecting security threats and optimizing network management. We propose a Multi-Instance Encrypted Traffic Transformer (MIETT) to capture both token-level and packet-level relationships. MIETT achieves results across five datasets, demonstrating its effectiveness in classifying encrypted traffic and understanding complex network behaviors.
arXiv Detail & Related papers (2024-12-19T12:52:53Z)
Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition [49.20086587208214]
We propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition. By using description texts, our method reduces the cross-domain differences between template and real traffic signs. Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels.
arXiv Detail & Related papers (2024-07-08T10:51:03Z)
BjTT: A Large-scale Multimodal Dataset for Traffic Prediction [49.93028461584377]
Traditional traffic prediction methods rely on historical traffic data to predict traffic trends. In this work, we explore how generative models combined with text describing the traffic system can be applied for traffic generation. We propose ChatTraffic, the first diffusion model for text-to-traffic generation.
arXiv Detail & Related papers (2024-03-08T04:19:56Z)
Lens: A Foundation Model for Network Traffic [19.3652490585798]
Lens is a foundation model for network traffic that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data. We design a novel loss that combines three distinct tasks: Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP)
arXiv Detail & Related papers (2024-02-06T02:45:13Z)
Low-Quality Training Data Only? A Robust Framework for Detecting Encrypted Malicious Network Traffic [19.636282208765547]
When machine learning models are trained with low-quality training data, they suffer degraded performance. We develop RAPIER that fully utilizes different distributions of normal and malicious traffic data in the feature space. RAPIER effectively achieves encrypted malicious traffic detection with the best F1 score of 0.773 and improves the F1 score of existing methods by an average of 272.5%.
arXiv Detail & Related papers (2023-09-09T13:49:30Z)
Efficient Federated Learning with Spike Neural Networks for Traffic Sign Recognition [70.306089187104]
We introduce powerful Spike Neural Networks (SNNs) into traffic sign recognition for energy-efficient and fast model training. Numerical results indicate that the proposed federated SNN outperforms traditional federated convolutional neural networks in terms of accuracy, noise immunity, and energy efficiency as well.
arXiv Detail & Related papers (2022-05-28T03:11:48Z)
Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification [0.0]
We present a new, public dataset of network traffic that includes labeled, Virtual Private Network (VPN)-encrypted network traffic generated by 10 applications and corresponding to 5 application categories. We also present an ML framework that is designed to rapidly train with modest data requirements and provide both calibrated, predictive probabilities as well as an interpretable out-of-distribution'' (OOD) score to flag novel traffic samples.
arXiv Detail & Related papers (2022-05-11T16:54:37Z)
Fine-grained TLS Services Classification with Reject Option [0.0]
This paper focuses on collecting a large up-to-date dataset with almost 200 fine-grained service labels and 140 million network flows extended with packet-level metadata. The number of flows is three orders of magnitude higher than in other existing public labeled datasets of encrypted traffic. The published dataset is intended as a benchmark for identifying services in encrypted traffic.
arXiv Detail & Related papers (2022-02-24T09:44:12Z)
Robust Semi-supervised Federated Learning for Images Automatic Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition. There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules. We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z)
CGNN: Traffic Classification with Graph Neural Network [13.851922724661538]
We present a graph neural network based traffic classification method, which builds a graph classifier over automatically extracted features over a chained graph. CGNN improves the prediction accuracy by 23% to 29% for application classification, by 2% to 37% for malicious traffic classification, and reaches the same accuracy level for encrypted traffic classification.
arXiv Detail & Related papers (2021-10-19T04:10:07Z)
Deep traffic light detection by overlaying synthetic context on arbitrary natural images [49.592798832978296]
We propose a method to generate artificial traffic-related training data for deep traffic light detectors. This data is generated using basic non-realistic computer graphics to blend fake traffic scenes on top of arbitrary image backgrounds. It also tackles the intrinsic data imbalance problem in traffic light datasets, caused mainly by the low amount of samples of the yellow state.
arXiv Detail & Related papers (2020-11-07T19:57:22Z)
Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network. PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.