Fine-grained TLS Services Classification with Reject Option
- URL: http://arxiv.org/abs/2202.11984v1
- Date: Thu, 24 Feb 2022 09:44:12 GMT
- Title: Fine-grained TLS Services Classification with Reject Option
- Authors: Jan Luxemburk, Tom\'a\v{s} \v{C}ejka
- Abstract summary: This paper focuses on collecting a large up-to-date dataset with almost 200 fine-grained service labels and 140 million network flows extended with packet-level metadata.
The number of flows is three orders of magnitude higher than in other existing public labeled datasets of encrypted traffic.
The published dataset is intended as a benchmark for identifying services in encrypted traffic.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent success and proliferation of machine learning and deep learning
have provided powerful tools, which are also utilized for encrypted traffic
analysis, classification, and threat detection. These methods, neural networks
in particular, are often complex and require a huge corpus of training data.
Therefore, this paper focuses on collecting a large up-to-date dataset with
almost 200 fine-grained service labels and 140 million network flows extended
with packet-level metadata. The number of flows is three orders of magnitude
higher than in other existing public labeled datasets of encrypted traffic. The
number of service labels, which is important to make the problem hard and
realistic, is four times higher than in the public dataset with the most class
labels. The published dataset is intended as a benchmark for identifying
services in encrypted traffic. Service identification can be further extended
with the task of "rejecting" unknown services, i.e., the traffic not seen
during the training phase. Neural networks offer superior performance for
tackling this more challenging problem. To showcase the dataset's usefulness,
we implemented a neural network with a multi-modal architecture, which is the
state-of-the-art approach, and achieved 97.04% classification accuracy and
detected 91.94% of unknown services with 5% false positive rate.
Related papers
- Lens: A Foundation Model for Network Traffic [19.3652490585798]
Lens is a foundation model for network traffic that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data.
We design a novel loss that combines three distinct tasks: Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP)
arXiv Detail & Related papers (2024-02-06T02:45:13Z) - A Survey of Label-Efficient Deep Learning for 3D Point Clouds [109.07889215814589]
This paper presents the first comprehensive survey of label-efficient learning of point clouds.
We propose a taxonomy that organizes label-efficient learning methods based on the data prerequisites provided by different types of labels.
For each approach, we outline the problem setup and provide an extensive literature review that showcases relevant progress and challenges.
arXiv Detail & Related papers (2023-05-31T12:54:51Z) - ET-BERT: A Contextualized Datagram Representation with Pre-training
Transformers for Encrypted Traffic Classification [9.180725486824118]
We propose a new traffic representation model called Encrypted Traffic Bidirectional Representations from Transformer (ET-BERT)
The pre-trained model can be fine-tuned on a small number of task-specific labeled data and achieves state-of-the-art performance across five encrypted traffic classification tasks.
arXiv Detail & Related papers (2022-02-13T14:54:48Z) - Auto-Transfer: Learning to Route Transferrable Representations [77.30427535329571]
We propose a novel adversarial multi-armed bandit approach which automatically learns to route source representations to appropriate target representations.
We see upwards of 5% accuracy improvements compared with the state-of-the-art knowledge transfer methods.
arXiv Detail & Related papers (2022-02-02T13:09:27Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - Generative Conversational Networks [67.13144697969501]
We propose a framework called Generative Conversational Networks, in which conversational agents learn to generate their own labelled training data.
We show an average improvement of 35% in intent detection and 21% in slot tagging over a baseline model trained from the seed data.
arXiv Detail & Related papers (2021-06-15T23:19:37Z) - Deep Learning for Network Traffic Classification [0.0]
Monitoring network traffic to identify content, services, and applications is an active research topic in network traffic control systems.
Previous work has identified machine learning methods that may enable application and service identification.
We propose a classification technique using an ensemble of deep learning architectures on packet, payload, and inter-arrival time sequences.
arXiv Detail & Related papers (2021-06-02T04:11:32Z) - Leveraging Multi-domain, Heterogeneous Data using Deep Multitask
Learning for Hate Speech Detection [21.410160004193916]
We propose a Convolution Neural Network based multi-task learning models (MTLs)footnotecode to leverage information from multiple sources.
Empirical analysis performed on three benchmark datasets shows the efficacy of the proposed approach.
arXiv Detail & Related papers (2021-03-23T09:31:01Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z) - Learning Cross-domain Generalizable Features by Representation
Disentanglement [11.74643883335152]
Deep learning models exhibit limited generalizability across different domains.
We propose Mutual-Information-based Disentangled Neural Networks (MIDNet) to extract generalizable features that enable transferring knowledge to unseen categorical features in target domains.
We demonstrate our method on handwritten digits datasets and a fetal ultrasound dataset for image classification tasks.
arXiv Detail & Related papers (2020-02-29T17:53:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.