Related papers: Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification

Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification

URL: http://arxiv.org/abs/2205.05628v1
Date: Wed, 11 May 2022 16:54:37 GMT
Title: Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification
Authors: Steven Jorgensen, John Holodnak, Jensen Dempsey, Karla de Souza, Ananditha Raghunath, Vernon Rivet, Noah DeMoes, Andr\'es Alejos, and Allan Wollaber (MIT Lincoln Laboratory)
Abstract summary: We present a new, public dataset of network traffic that includes labeled, Virtual Private Network (VPN)-encrypted network traffic generated by 10 applications and corresponding to 5 application categories. We also present an ML framework that is designed to rapidly train with modest data requirements and provide both calibrated, predictive probabilities as well as an interpretable out-of-distribution'' (OOD) score to flag novel traffic samples.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the increasing prevalence of encrypted network traffic, cyber security analysts have been turning to machine learning (ML) techniques to elucidate the traffic on their networks. However, ML models can become stale as known traffic features can shift between networks and as new traffic emerges that is outside of the distribution of the training set. In order to reliably adapt in this dynamic environment, ML models must additionally provide contextualized uncertainty quantification to their predictions, which has received little attention in the cyber security domain. Uncertainty quantification is necessary both to signal when the model is uncertain about which class to choose in its label assignment and when the traffic is not likely to belong to any pre-trained classes. We present a new, public dataset of network traffic that includes labeled, Virtual Private Network (VPN)-encrypted network traffic generated by 10 applications and corresponding to 5 application categories. We also present an ML framework that is designed to rapidly train with modest data requirements and provide both calibrated, predictive probabilities as well as an interpretable ``out-of-distribution'' (OOD) score to flag novel traffic samples. We describe how to compute a calibrated OOD score from p-values of the so-called relative Mahalanobis distance. We demonstrate that our framework achieves an F1 score of 0.98 on our dataset and that it can extend to an enterprise network by testing the model: (1) on data from similar applications, (2) on dissimilar application traffic from an existing category, and (3) on application traffic from a new category. The model correctly flags uncertain traffic and, upon retraining, accurately incorporates the new data. We additionally demonstrate good performance (F1 score of 0.97) when packet sizes are made to be uniform, as occurs for certain encryption protocols.

Related papers

NetFlowGen: Leveraging Generative Pre-training for Network Traffic Dynamics [72.95483148058378]
We propose to pre-train a general-purpose machine learning model to capture traffic dynamics with only traffic data from NetFlow records. We address challenges such as unifying network feature representations, learning from large unlabeled traffic data volume, and testing on real downstream tasks in DDoS attack detection.
arXiv Detail & Related papers (2024-12-30T00:47:49Z)
MIETT: Multi-Instance Encrypted Traffic Transformer for Encrypted Traffic Classification [59.96233305733875]
Classifying traffic is essential for detecting security threats and optimizing network management. We propose a Multi-Instance Encrypted Traffic Transformer (MIETT) to capture both token-level and packet-level relationships. MIETT achieves results across five datasets, demonstrating its effectiveness in classifying encrypted traffic and understanding complex network behaviors.
arXiv Detail & Related papers (2024-12-19T12:52:53Z)
Lens: A Foundation Model for Network Traffic [19.3652490585798]
Lens is a foundation model for network traffic that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data. We design a novel loss that combines three distinct tasks: Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP)
arXiv Detail & Related papers (2024-02-06T02:45:13Z)
Hybrid PLS-ML Authentication Scheme for V2I Communication Networks [0.0]
We propose a novel hybrid physical layer security (PLS)-machine learning (ML) authentication scheme by exploiting the position of the transmitter vehicle as a device fingerprint. We use a time-of-arrival (ToA) based localization mechanism where the ToA is estimated at roadside units (RSUs), and the coordinates of the transmitter vehicle are extracted at the base station (BS). To track the mobility of the moving legitimate vehicle, we use ML model trained on several system parameters. We observe that our proposed position-based mechanism outperforms the baseline scheme significantly in terms of missed detections.
arXiv Detail & Related papers (2023-08-28T16:34:50Z)
Convolutional Neural Networks for the classification of glitches in gravitational-wave data streams [52.77024349608834]
We classify transient noise signals (i.e.glitches) and gravitational waves in data from the Advanced LIGO detectors. We use models with a supervised learning approach, both trained from scratch using the Gravity Spy dataset. We also explore a self-supervised approach, pre-training models with automatically generated pseudo-labels.
arXiv Detail & Related papers (2023-03-24T11:12:37Z)
Multi-view Multi-label Anomaly Network Traffic Classification based on MLP-Mixer Neural Network [55.21501819988941]
Existing network traffic classification based on convolutional neural networks (CNNs) often emphasizes local patterns of traffic data while ignoring global information associations. We propose an end-to-end network traffic classification method.
arXiv Detail & Related papers (2022-10-30T01:52:05Z)
ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification [9.180725486824118]
We propose a new traffic representation model called Encrypted Traffic Bidirectional Representations from Transformer (ET-BERT) The pre-trained model can be fine-tuned on a small number of task-specific labeled data and achieves state-of-the-art performance across five encrypted traffic classification tasks.
arXiv Detail & Related papers (2022-02-13T14:54:48Z)
Robust Semi-supervised Federated Learning for Images Automatic Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition. There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules. We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z)
CGNN: Traffic Classification with Graph Neural Network [13.851922724661538]
We present a graph neural network based traffic classification method, which builds a graph classifier over automatically extracted features over a chained graph. CGNN improves the prediction accuracy by 23% to 29% for application classification, by 2% to 37% for malicious traffic classification, and reaches the same accuracy level for encrypted traffic classification.
arXiv Detail & Related papers (2021-10-19T04:10:07Z)
Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming [97.40955121478716]
We propose a first-order dual SDP algorithm that requires memory only linear in the total number of network activations. We significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively. We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
arXiv Detail & Related papers (2020-10-22T12:32:29Z)
Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network. PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
Federated Learning in Vehicular Networks [41.89469856322786]
Federated learning (FL) framework has been introduced as an efficient tool with the goal of reducing transmission overhead. In this paper, we investigate the usage of FL over centralized learning (CL) in vehicular network applications to develop intelligent transportation systems. We identify the major challenges from both learning perspective, i.e., data labeling and model training, and from the communications point of view, i.e., data rate, reliability, transmission overhead, privacy and resource management.
arXiv Detail & Related papers (2020-06-02T06:32:59Z)
Key Points Estimation and Point Instance Segmentation Approach for Lane Detection [65.37887088194022]
We propose a traffic line detection method called Point Instance Network (PINet) The PINet includes several stacked hourglass networks that are trained simultaneously. The PINet achieves competitive accuracy and false positive on the TuSimple and Culane datasets.
arXiv Detail & Related papers (2020-02-16T15:51:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.