SoK: Decoding the Enigma of Encrypted Network Traffic Classifiers
- URL: http://arxiv.org/abs/2503.20093v3
- Date: Mon, 14 Apr 2025 02:06:56 GMT
- Title: SoK: Decoding the Enigma of Encrypted Network Traffic Classifiers
- Authors: Nimesha Wickramasinghe, Arash Shaghaghi, Gene Tsudik, Sanjay Jha,
- Abstract summary: Modern encryption protocols such as TLS 1.3 has challenged traditional network traffic classification (NTC) methods.<n>In this paper, we comprehensively analyze ML-based NTC studies, developing a taxonomy of their design choices and benchmarking suites.<n>We demonstrate widespread reliance on outdated datasets, oversights in design choices, and the consequences of unsubstantiated assumptions.
- Score: 12.048303829428448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The adoption of modern encryption protocols such as TLS 1.3 has significantly challenged traditional network traffic classification (NTC) methods. As a consequence, researchers are increasingly turning to machine learning (ML) approaches to overcome these obstacles. In this paper, we comprehensively analyze ML-based NTC studies, developing a taxonomy of their design choices, benchmarking suites, and prevalent assumptions impacting classifier performance. Through this systematization, we demonstrate widespread reliance on outdated datasets, oversights in design choices, and the consequences of unsubstantiated assumptions. Our evaluation reveals that the majority of proposed encrypted traffic classifiers have mistakenly utilized unencrypted traffic due to the use of legacy datasets. Furthermore, by conducting 348 feature occlusion experiments on state-of-the-art classifiers, we show how oversights in NTC design choices lead to overfitting, and validate or refute prevailing assumptions with empirical evidence. By highlighting lessons learned, we offer strategic insights, identify emerging research directions, and recommend best practices to support the development of real-world applicable NTC methodologies.
Related papers
- A Concise Survey on Lane Topology Reasoning for HD Mapping [30.73664953504888]
Lane topology reasoning techniques play a crucial role in high-definition (HD) mapping and autonomous driving applications.
Recent years have witnessed significant advances in this field, but there has been limited effort to consolidate these works into a comprehensive overview.
This survey systematically reviews the evolution and current state of lane topology reasoning methods.
arXiv Detail & Related papers (2025-03-31T11:30:40Z) - An investigation into the performances of the Current state-of-the-art Naive Bayes, Non-Bayesian and Deep Learning Based Classifier for Phishing Detection: A Survey [0.9567504785687562]
Phishing is one of the most effective ways in which cybercriminals get sensitive details from potential victims.
In this research, we did a comprehensive review of current state-of-the-art machine learning and deep learning phishing detection techniques.
arXiv Detail & Related papers (2024-11-24T05:20:09Z) - Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - Towards Evaluating Transfer-based Attacks Systematically, Practically,
and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention.
An increasing number of transfer-based methods have been developed to fool black-box DNN models.
We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z) - Facing Unknown: Open-World Encrypted Traffic Classification Based on Contrastive Pre-Training [5.318006462723139]
We propose a novel Open-World Contrastive Pre-training (OWCP) framework for this.
OWCP performs contrastive pre-training to obtain a robust feature representation.
We conduct comprehensive ablation studies and sensitivity analyses to validate each integral component of OWCP.
arXiv Detail & Related papers (2023-08-31T17:04:20Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - A Lightweight, Efficient and Explainable-by-Design Convolutional Neural
Network for Internet Traffic Classification [9.365794791156972]
This paper introduces a new Lightweight, Efficient and eXplainable-by-design convolutional neural network (LEXNet) for Internet traffic classification.
LEXNet relies on a new residual block (for lightweight and efficiency purposes) and prototype layer (for explainability)
Based on a commercial-grade dataset, our evaluation shows that LEXNet succeeds to maintain the same accuracy as the best performing state-of-the-art neural network.
arXiv Detail & Related papers (2022-02-11T10:21:34Z) - Bridging the gap to real-world for network intrusion detection systems
with data-centric approach [1.4699455652461724]
This paper presents a systematic data-centric approach to address the current limitations of NIDS research.
It generates NIDS datasets composed of the most recent network traffic and attacks, with the labeling process integrated by design.
arXiv Detail & Related papers (2021-10-25T04:50:12Z) - Semi-Supervised Few-Shot Intent Classification and Slot Filling [3.602651625446309]
Intent classification (IC) and slot filling (SF) are two fundamental tasks in modern Natural Language Understanding (NLU) systems.
In this work, we investigate how contrastive learning and unsupervised data augmentation methods can benefit these existing supervised meta-learning pipelines.
arXiv Detail & Related papers (2021-09-17T20:26:23Z) - Boosting the Generalization Capability in Cross-Domain Few-shot Learning
via Noise-enhanced Supervised Autoencoder [23.860842627883187]
We teach the model to capture broader variations of the feature distributions with a novel noise-enhanced supervised autoencoder (NSAE)
NSAE trains the model by jointly reconstructing inputs and predicting the labels of inputs as well as their reconstructed pairs.
We also take advantage of NSAE structure and propose a two-step fine-tuning procedure that achieves better adaption and improves classification performance in the target domain.
arXiv Detail & Related papers (2021-08-11T04:45:56Z) - A Survey on Concept Factorization: From Shallow to Deep Representation
Learning [104.78577405792592]
Concept Factorization (CF) has attracted a great deal of interests in the areas of machine learning and data mining.
We first re-view the root CF method, and then explore the advancement of CF-based representation learning.
We also introduce the potential application areas of CF-based methods.
arXiv Detail & Related papers (2020-07-31T04:19:14Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.