Extensible Machine Learning for Encrypted Network Traffic Application
Labeling via Uncertainty Quantification
- URL: http://arxiv.org/abs/2205.05628v1
- Date: Wed, 11 May 2022 16:54:37 GMT
- Title: Extensible Machine Learning for Encrypted Network Traffic Application
Labeling via Uncertainty Quantification
- Authors: Steven Jorgensen, John Holodnak, Jensen Dempsey, Karla de Souza,
Ananditha Raghunath, Vernon Rivet, Noah DeMoes, Andr\'es Alejos, and Allan
Wollaber (MIT Lincoln Laboratory)
- Abstract summary: We present a new, public dataset of network traffic that includes labeled, Virtual Private Network (VPN)-encrypted network traffic generated by 10 applications and corresponding to 5 application categories.
We also present an ML framework that is designed to rapidly train with modest data requirements and provide both calibrated, predictive probabilities as well as an interpretable out-of-distribution'' (OOD) score to flag novel traffic samples.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing prevalence of encrypted network traffic, cyber security
analysts have been turning to machine learning (ML) techniques to elucidate the
traffic on their networks. However, ML models can become stale as known traffic
features can shift between networks and as new traffic emerges that is outside
of the distribution of the training set. In order to reliably adapt in this
dynamic environment, ML models must additionally provide contextualized
uncertainty quantification to their predictions, which has received little
attention in the cyber security domain. Uncertainty quantification is necessary
both to signal when the model is uncertain about which class to choose in its
label assignment and when the traffic is not likely to belong to any
pre-trained classes.
We present a new, public dataset of network traffic that includes labeled,
Virtual Private Network (VPN)-encrypted network traffic generated by 10
applications and corresponding to 5 application categories. We also present an
ML framework that is designed to rapidly train with modest data requirements
and provide both calibrated, predictive probabilities as well as an
interpretable ``out-of-distribution'' (OOD) score to flag novel traffic
samples. We describe how to compute a calibrated OOD score from p-values of the
so-called relative Mahalanobis distance.
We demonstrate that our framework achieves an F1 score of 0.98 on our dataset
and that it can extend to an enterprise network by testing the model: (1) on
data from similar applications, (2) on dissimilar application traffic from an
existing category, and (3) on application traffic from a new category. The
model correctly flags uncertain traffic and, upon retraining, accurately
incorporates the new data. We additionally demonstrate good performance (F1
score of 0.97) when packet sizes are made to be uniform, as occurs for certain
encryption protocols.
Related papers
- Lens: A Foundation Model for Network Traffic [19.3652490585798]
Lens is a foundation model for network traffic that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data.
We design a novel loss that combines three distinct tasks: Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP)
arXiv Detail & Related papers (2024-02-06T02:45:13Z) - Hybrid PLS-ML Authentication Scheme for V2I Communication Networks [0.0]
We propose a novel hybrid physical layer security (PLS)-machine learning (ML) authentication scheme by exploiting the position of the transmitter vehicle as a device fingerprint.
We use a time-of-arrival (ToA) based localization mechanism where the ToA is estimated at roadside units (RSUs), and the coordinates of the transmitter vehicle are extracted at the base station (BS).
To track the mobility of the moving legitimate vehicle, we use ML model trained on several system parameters. We observe that our proposed position-based mechanism outperforms the baseline scheme significantly in terms of missed detections.
arXiv Detail & Related papers (2023-08-28T16:34:50Z) - Convolutional Neural Networks for the classification of glitches in
gravitational-wave data streams [52.77024349608834]
We classify transient noise signals (i.e.glitches) and gravitational waves in data from the Advanced LIGO detectors.
We use models with a supervised learning approach, both trained from scratch using the Gravity Spy dataset.
We also explore a self-supervised approach, pre-training models with automatically generated pseudo-labels.
arXiv Detail & Related papers (2023-03-24T11:12:37Z) - Multi-view Multi-label Anomaly Network Traffic Classification based on
MLP-Mixer Neural Network [55.21501819988941]
Existing network traffic classification based on convolutional neural networks (CNNs) often emphasizes local patterns of traffic data while ignoring global information associations.
We propose an end-to-end network traffic classification method.
arXiv Detail & Related papers (2022-10-30T01:52:05Z) - ET-BERT: A Contextualized Datagram Representation with Pre-training
Transformers for Encrypted Traffic Classification [9.180725486824118]
We propose a new traffic representation model called Encrypted Traffic Bidirectional Representations from Transformer (ET-BERT)
The pre-trained model can be fine-tuned on a small number of task-specific labeled data and achieves state-of-the-art performance across five encrypted traffic classification tasks.
arXiv Detail & Related papers (2022-02-13T14:54:48Z) - Robust Semi-supervised Federated Learning for Images Automatic
Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition.
There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules.
We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z) - CGNN: Traffic Classification with Graph Neural Network [13.851922724661538]
We present a graph neural network based traffic classification method, which builds a graph classifier over automatically extracted features over a chained graph.
CGNN improves the prediction accuracy by 23% to 29% for application classification, by 2% to 37% for malicious traffic classification, and reaches the same accuracy level for encrypted traffic classification.
arXiv Detail & Related papers (2021-10-19T04:10:07Z) - Enabling certification of verification-agnostic networks via
memory-efficient semidefinite programming [97.40955121478716]
We propose a first-order dual SDP algorithm that requires memory only linear in the total number of network activations.
We significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively.
We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
arXiv Detail & Related papers (2020-10-22T12:32:29Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - Federated Learning in Vehicular Networks [41.89469856322786]
Federated learning (FL) framework has been introduced as an efficient tool with the goal of reducing transmission overhead.
In this paper, we investigate the usage of FL over centralized learning (CL) in vehicular network applications to develop intelligent transportation systems.
We identify the major challenges from both learning perspective, i.e., data labeling and model training, and from the communications point of view, i.e., data rate, reliability, transmission overhead, privacy and resource management.
arXiv Detail & Related papers (2020-06-02T06:32:59Z) - Key Points Estimation and Point Instance Segmentation Approach for Lane
Detection [65.37887088194022]
We propose a traffic line detection method called Point Instance Network (PINet)
The PINet includes several stacked hourglass networks that are trained simultaneously.
The PINet achieves competitive accuracy and false positive on the TuSimple and Culane datasets.
arXiv Detail & Related papers (2020-02-16T15:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.