ET-BERT: A Contextualized Datagram Representation with Pre-training
Transformers for Encrypted Traffic Classification
- URL: http://arxiv.org/abs/2202.06335v1
- Date: Sun, 13 Feb 2022 14:54:48 GMT
- Title: ET-BERT: A Contextualized Datagram Representation with Pre-training
Transformers for Encrypted Traffic Classification
- Authors: Xinjie Lin, Gang Xiong, Gaopeng Gou, Zhen Li, Junzheng Shi, Jing Yu
- Abstract summary: We propose a new traffic representation model called Encrypted Traffic Bidirectional Representations from Transformer (ET-BERT)
The pre-trained model can be fine-tuned on a small number of task-specific labeled data and achieves state-of-the-art performance across five encrypted traffic classification tasks.
- Score: 9.180725486824118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Encrypted traffic classification requires discriminative and robust traffic
representation captured from content-invisible and imbalanced traffic data for
accurate classification, which is challenging but indispensable to achieve
network security and network management. The major limitation of existing
solutions is that they highly rely on the deep features, which are overly
dependent on data size and hard to generalize on unseen data. How to leverage
the open-domain unlabeled traffic data to learn representation with strong
generalization ability remains a key challenge. In this paper,we propose a new
traffic representation model called Encrypted Traffic Bidirectional Encoder
Representations from Transformer (ET-BERT), which pre-trains deep
contextualized datagram-level representation from large-scale unlabeled data.
The pre-trained model can be fine-tuned on a small number of task-specific
labeled data and achieves state-of-the-art performance across five encrypted
traffic classification tasks, remarkably pushing the F1 of ISCX-Tor to 99.2%
(4.4% absolute improvement), ISCX-VPN-Service to 98.9% (5.2% absolute
improvement), Cross-Platform (Android) to 92.5% (5.4% absolute improvement),
CSTNET-TLS 1.3 to 97.4% (10.0% absolute improvement). Notably, we provide
explanation of the empirically powerful pre-training model by analyzing the
randomness of ciphers. It gives us insights in understanding the boundary of
classification ability over encrypted traffic. The code is available at:
https://github.com/linwhitehat/ET-BERT.
Related papers
- Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition [49.20086587208214]
We propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition.
By using description texts, our method reduces the cross-domain differences between template and real traffic signs.
Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels.
arXiv Detail & Related papers (2024-07-08T10:51:03Z) - BjTT: A Large-scale Multimodal Dataset for Traffic Prediction [49.93028461584377]
Traditional traffic prediction methods rely on historical traffic data to predict traffic trends.
In this work, we explore how generative models combined with text describing the traffic system can be applied for traffic generation.
We propose ChatTraffic, the first diffusion model for text-to-traffic generation.
arXiv Detail & Related papers (2024-03-08T04:19:56Z) - Lens: A Foundation Model for Network Traffic [19.3652490585798]
Lens is a foundation model for network traffic that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data.
We design a novel loss that combines three distinct tasks: Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP)
arXiv Detail & Related papers (2024-02-06T02:45:13Z) - Low-Quality Training Data Only? A Robust Framework for Detecting Encrypted Malicious Network Traffic [19.636282208765547]
When machine learning models are trained with low-quality training data, they suffer degraded performance.
We develop RAPIER that fully utilizes different distributions of normal and malicious traffic data in the feature space.
RAPIER effectively achieves encrypted malicious traffic detection with the best F1 score of 0.773 and improves the F1 score of existing methods by an average of 272.5%.
arXiv Detail & Related papers (2023-09-09T13:49:30Z) - Efficient Federated Learning with Spike Neural Networks for Traffic Sign
Recognition [70.306089187104]
We introduce powerful Spike Neural Networks (SNNs) into traffic sign recognition for energy-efficient and fast model training.
Numerical results indicate that the proposed federated SNN outperforms traditional federated convolutional neural networks in terms of accuracy, noise immunity, and energy efficiency as well.
arXiv Detail & Related papers (2022-05-28T03:11:48Z) - Extensible Machine Learning for Encrypted Network Traffic Application
Labeling via Uncertainty Quantification [0.0]
We present a new, public dataset of network traffic that includes labeled, Virtual Private Network (VPN)-encrypted network traffic generated by 10 applications and corresponding to 5 application categories.
We also present an ML framework that is designed to rapidly train with modest data requirements and provide both calibrated, predictive probabilities as well as an interpretable out-of-distribution'' (OOD) score to flag novel traffic samples.
arXiv Detail & Related papers (2022-05-11T16:54:37Z) - Fine-grained TLS Services Classification with Reject Option [0.0]
This paper focuses on collecting a large up-to-date dataset with almost 200 fine-grained service labels and 140 million network flows extended with packet-level metadata.
The number of flows is three orders of magnitude higher than in other existing public labeled datasets of encrypted traffic.
The published dataset is intended as a benchmark for identifying services in encrypted traffic.
arXiv Detail & Related papers (2022-02-24T09:44:12Z) - Robust Semi-supervised Federated Learning for Images Automatic
Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition.
There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules.
We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z) - CGNN: Traffic Classification with Graph Neural Network [13.851922724661538]
We present a graph neural network based traffic classification method, which builds a graph classifier over automatically extracted features over a chained graph.
CGNN improves the prediction accuracy by 23% to 29% for application classification, by 2% to 37% for malicious traffic classification, and reaches the same accuracy level for encrypted traffic classification.
arXiv Detail & Related papers (2021-10-19T04:10:07Z) - Deep traffic light detection by overlaying synthetic context on
arbitrary natural images [49.592798832978296]
We propose a method to generate artificial traffic-related training data for deep traffic light detectors.
This data is generated using basic non-realistic computer graphics to blend fake traffic scenes on top of arbitrary image backgrounds.
It also tackles the intrinsic data imbalance problem in traffic light datasets, caused mainly by the low amount of samples of the yellow state.
arXiv Detail & Related papers (2020-11-07T19:57:22Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.