What Does Normal Even Mean? Evaluating Benign Traffic in Intrusion Detection Datasets
- URL: http://arxiv.org/abs/2509.09564v1
- Date: Thu, 11 Sep 2025 15:55:21 GMT
- Title: What Does Normal Even Mean? Evaluating Benign Traffic in Intrusion Detection Datasets
- Authors: Meghan Wilkinson, Robert H Thomson,
- Abstract summary: Supervised machine learning techniques rely on labeled data to achieve high task performance.<n>This paper evaluates the structure of benign traffic in several common intrusion detection datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supervised machine learning techniques rely on labeled data to achieve high task performance, but this requires the labels to capture some meaningful differences in the underlying data structure. For training network intrusion detection algorithms, most datasets contain a series of attack classes and a single large benign class which captures all non-attack network traffic. A review of intrusion detection papers and guides that explicitly state their data preprocessing steps identified that the majority took the labeled categories of the dataset at face value when training their algorithms. The present paper evaluates the structure of benign traffic in several common intrusion detection datasets (NSL-KDD, UNSW-NB15, and CIC-IDS 2017) and determines whether there are meaningful sub-categories within this traffic which may improve overall multi-classification performance using common machine learning techniques. We present an overview of some unsupervised clustering techniques (e.g., HDBSCAN, Mean Shift Clustering) and show how they differentially cluster the benign traffic space.
Related papers
- Universal Transformation of One-Class Classifiers for Unsupervised Anomaly Detection [51.73001988341294]
Anomaly detection is typically formulated as a one-class classification problem.<n>We present a dataset folding method that transforms an arbitrary one-class classifier-based anomaly detector into a fully unsupervised method.
arXiv Detail & Related papers (2026-02-13T16:54:12Z) - Language of Network: A Generative Pre-trained Model for Encrypted Traffic Comprehension [16.795038178588324]
Deep learning is currently the predominant approach for encrypted traffic classification through feature analysis.<n>We present GBC, a generative model based on pre-training for encrypted traffic comprehension.<n>It achieves superior results in both traffic classification and generation tasks, resulting in a 5% improvement in F1 score compared to state-of-the-art methods for classification tasks.
arXiv Detail & Related papers (2025-05-26T04:04:29Z) - The importance of the clustering model to detect new types of intrusion in data traffic [0.0]
The presented work use K-means algorithm, which is a popular clustering technique.<n>Data was gathered utilizing Kali Linux environment, cicflowmeter traffic, and Putty Software tools.<n>The model counted the attacks and assigned numbers to each one of them.
arXiv Detail & Related papers (2024-11-21T19:40:31Z) - Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection [75.02249869573994]
In open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes.<n>Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes.<n>We propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector)
arXiv Detail & Related papers (2024-11-20T02:57:35Z) - DOC-NAD: A Hybrid Deep One-class Classifier for Network Anomaly
Detection [0.0]
Machine Learning approaches have been used to enhance the detection capabilities of Network Intrusion Detection Systems (NIDSs)
Recent work has achieved near-perfect performance by following binary- and multi-class network anomaly detection tasks.
This paper proposes a Deep One-Class (DOC) classifier for network intrusion detection by only training on benign network data samples.
arXiv Detail & Related papers (2022-12-15T00:08:05Z) - Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D
Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on.
We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z) - Learning to Detect Instance-level Salient Objects Using Complementary
Image Labels [55.049347205603304]
We present the first weakly-supervised approach to the salient instance detection problem.
We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2021-11-19T10:15:22Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - Generalized Insider Attack Detection Implementation using NetFlow Data [0.6236743421605786]
We study an approach centered on using network data to identify attacks.
Our work builds on unsupervised machine learning techniques such as One-Class SVM and bi-clustering.
We show that our approach is a promising tool for insider attack detection in realistic settings.
arXiv Detail & Related papers (2020-10-27T14:00:31Z) - Weakly-supervised Salient Instance Detection [65.0408760733005]
We present the first weakly-supervised approach to the salient instance detection problem.
We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2020-09-29T09:47:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.