Related papers: SoK: Decoding the Enigma of Encrypted Network Traffic Classifiers

SoK: Decoding the Enigma of Encrypted Network Traffic Classifiers

URL: http://arxiv.org/abs/2503.20093v3
Date: Mon, 14 Apr 2025 02:06:56 GMT
Title: SoK: Decoding the Enigma of Encrypted Network Traffic Classifiers
Authors: Nimesha Wickramasinghe, Arash Shaghaghi, Gene Tsudik, Sanjay Jha,
Abstract summary: Modern encryption protocols such as TLS 1.3 has challenged traditional network traffic classification (NTC) methods.<n>In this paper, we comprehensively analyze ML-based NTC studies, developing a taxonomy of their design choices and benchmarking suites.<n>We demonstrate widespread reliance on outdated datasets, oversights in design choices, and the consequences of unsubstantiated assumptions.
Score: 12.048303829428448
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The adoption of modern encryption protocols such as TLS 1.3 has significantly challenged traditional network traffic classification (NTC) methods. As a consequence, researchers are increasingly turning to machine learning (ML) approaches to overcome these obstacles. In this paper, we comprehensively analyze ML-based NTC studies, developing a taxonomy of their design choices, benchmarking suites, and prevalent assumptions impacting classifier performance. Through this systematization, we demonstrate widespread reliance on outdated datasets, oversights in design choices, and the consequences of unsubstantiated assumptions. Our evaluation reveals that the majority of proposed encrypted traffic classifiers have mistakenly utilized unencrypted traffic due to the use of legacy datasets. Furthermore, by conducting 348 feature occlusion experiments on state-of-the-art classifiers, we show how oversights in NTC design choices lead to overfitting, and validate or refute prevailing assumptions with empirical evidence. By highlighting lessons learned, we offer strategic insights, identify emerging research directions, and recommend best practices to support the development of real-world applicable NTC methodologies.

Related papers

DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective [59.66984417026933]
We introduce a novel taxonomy, classifying existing methods based on their reliance on internal features (IF) (inherent to the data) versus external features (EF) (artificially introduced for auditing)<n>We formulate two primary attack types: evasion attacks, designed to conceal the use of a dataset, and forgery attacks, intending to falsely implicate an unused dataset.<n>Building on the understanding of existing methods and attack objectives, we further propose systematic attack strategies: decoupling, removal, and detection for evasion; adversarial example-based methods for forgery.<n>Our benchmark, DATABench, comprises 17 evasion attacks, 5 forgery attacks, and 9
arXiv Detail & Related papers (2025-07-08T03:07:15Z)
Interpretable Anomaly Detection in Encrypted Traffic Using SHAP with Machine Learning Models [0.0]
This study aims to develop an interpretable machine learning-based framework for anomaly detection in encrypted network traffic.<n>Models are trained and evaluated on three benchmark encrypted traffic datasets.<n> SHAP visualizations successfully revealed the most influential traffic features contributing to anomaly predictions.
arXiv Detail & Related papers (2025-05-22T05:50:39Z)
A Concise Survey on Lane Topology Reasoning for HD Mapping [30.73664953504888]
Lane topology reasoning techniques play a crucial role in high-definition (HD) mapping and autonomous driving applications. Recent years have witnessed significant advances in this field, but there has been limited effort to consolidate these works into a comprehensive overview. This survey systematically reviews the evolution and current state of lane topology reasoning methods.
arXiv Detail & Related papers (2025-03-31T11:30:40Z)
An investigation into the performances of the Current state-of-the-art Naive Bayes, Non-Bayesian and Deep Learning Based Classifier for Phishing Detection: A Survey [0.9567504785687562]
Phishing is one of the most effective ways in which cybercriminals get sensitive details from potential victims. In this research, we did a comprehensive review of current state-of-the-art machine learning and deep learning phishing detection techniques.
arXiv Detail & Related papers (2024-11-24T05:20:09Z)
Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence. Recent trends demonstrate the potential homogeneity of these two fields. We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z)
Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention. An increasing number of transfer-based methods have been developed to fool black-box DNN models. We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z)
Facing Unknown: Open-World Encrypted Traffic Classification Based on Contrastive Pre-Training [5.318006462723139]
We propose a novel Open-World Contrastive Pre-training (OWCP) framework for this. OWCP performs contrastive pre-training to obtain a robust feature representation. We conduct comprehensive ablation studies and sensitivity analyses to validate each integral component of OWCP.
arXiv Detail & Related papers (2023-08-31T17:04:20Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
Uncertainty Estimation by Fisher Information-based Evidential Deep Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL) In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z)
A Lightweight, Efficient and Explainable-by-Design Convolutional Neural Network for Internet Traffic Classification [9.365794791156972]
This paper introduces a new Lightweight, Efficient and eXplainable-by-design convolutional neural network (LEXNet) for Internet traffic classification. LEXNet relies on a new residual block (for lightweight and efficiency purposes) and prototype layer (for explainability) Based on a commercial-grade dataset, our evaluation shows that LEXNet succeeds to maintain the same accuracy as the best performing state-of-the-art neural network.
arXiv Detail & Related papers (2022-02-11T10:21:34Z)
Bridging the gap to real-world for network intrusion detection systems with data-centric approach [1.4699455652461724]
This paper presents a systematic data-centric approach to address the current limitations of NIDS research. It generates NIDS datasets composed of the most recent network traffic and attacks, with the labeling process integrated by design.
arXiv Detail & Related papers (2021-10-25T04:50:12Z)
Semi-Supervised Few-Shot Intent Classification and Slot Filling [3.602651625446309]
Intent classification (IC) and slot filling (SF) are two fundamental tasks in modern Natural Language Understanding (NLU) systems. In this work, we investigate how contrastive learning and unsupervised data augmentation methods can benefit these existing supervised meta-learning pipelines.
arXiv Detail & Related papers (2021-09-17T20:26:23Z)
Boosting the Generalization Capability in Cross-Domain Few-shot Learning via Noise-enhanced Supervised Autoencoder [23.860842627883187]
We teach the model to capture broader variations of the feature distributions with a novel noise-enhanced supervised autoencoder (NSAE) NSAE trains the model by jointly reconstructing inputs and predicting the labels of inputs as well as their reconstructed pairs. We also take advantage of NSAE structure and propose a two-step fine-tuning procedure that achieves better adaption and improves classification performance in the target domain.
arXiv Detail & Related papers (2021-08-11T04:45:56Z)
A Survey on Concept Factorization: From Shallow to Deep Representation Learning [104.78577405792592]
Concept Factorization (CF) has attracted a great deal of interests in the areas of machine learning and data mining. We first re-view the root CF method, and then explore the advancement of CF-based representation learning. We also introduce the potential application areas of CF-based methods.
arXiv Detail & Related papers (2020-07-31T04:19:14Z)
Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method. PCL implicitly encodes semantic structures of the data into the learned embedding space. PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.