Related papers: Knowledge-to-Data: LLM-Driven Synthesis of Structured Network Traffic for Testbed-Free IDS Evaluation

Knowledge-to-Data: LLM-Driven Synthesis of Structured Network Traffic for Testbed-Free IDS Evaluation

URL: http://arxiv.org/abs/2601.05022v1
Date: Thu, 08 Jan 2026 15:31:33 GMT
Title: Knowledge-to-Data: LLM-Driven Synthesis of Structured Network Traffic for Testbed-Free IDS Evaluation
Authors: Konstantinos E. Kampourakis, Vyron Kampourakis, Efstratios Chatzoglou, Georgios Kambourakis, Stefanos Gritzalis,
Abstract summary: This paper investigates whether Large Language Models (LLMs) can operate as controlled knowledge-to-data engines for generating structured synthetic network traffic datasets.<n>We propose a methodology that combines protocol documentation, attack semantics, and explicit statistical rules to condition LLMs without fine-tuning or access to raw samples.<n>Results show that, under explicit constraints, LLM-generated datasets can closely approximate the statistical and structural characteristics of real network traffic.
Score: 0.4893345190925178
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Realistic, large-scale, and well-labeled cybersecurity datasets are essential for training and evaluating Intrusion Detection Systems (IDS). However, they remain difficult to obtain due to privacy constraints, data sensitivity, and the cost of building controlled collection environments such as testbeds and cyber ranges. This paper investigates whether Large Language Models (LLMs) can operate as controlled knowledge-to-data engines for generating structured synthetic network traffic datasets suitable for IDS research. We propose a methodology that combines protocol documentation, attack semantics, and explicit statistical rules to condition LLMs without fine-tuning or access to raw samples. Using the AWID3 IEEE~802.11 benchmark as a demanding case study, we generate labeled datasets with four state-of-the-art LLMs and assess fidelity through a multi-level validation framework including global similarity metrics, per-feature distribution testing, structural comparison, and cross-domain classification. Results show that, under explicit constraints, LLM-generated datasets can closely approximate the statistical and structural characteristics of real network traffic, enabling gradient-boosting classifiers to achieve F1-scores up to 0.956 when evaluated on real samples. Overall, the findings suggest that constrained LLM-driven generation can facilitate on-demand IDS experimentation, providing a testbed-free, privacy-preserving alternative that overcomes the traditional bottlenecks of physical traffic collection and manual labeling.

Related papers

The LLM Data Auditor: A Metric-oriented Survey on Quality and Trustworthiness in Evaluating Synthetic Data [25.926467401802046]
Large Language Models (LLMs) have emerged as powerful tools for generating data across various modalities.<n>We propose a framework for evaluating synthetic data from two dimensions: quality and trustworthiness.
arXiv Detail & Related papers (2026-01-25T06:40:25Z)
ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction [57.930531826380836]
This work explores whether a foundational segmentation model can address label scarcity in the pixel-level vision task as an annotator for unlabeled images.<n>We propose ConformalSAM, a novel SSSS framework which first calibrates the foundation model using the target domain's labeled data and then filters out unreliable pixel labels of unlabeled data.
arXiv Detail & Related papers (2025-07-21T17:02:57Z)
Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z)
Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs [1.6332728502735252]
Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly deployed in industry applications.<n>Their reliability remains hampered by challenges in detecting hallucinations.<n>This paper addresses the bottleneck of data annotation by investigating the feasibility of reducing training data requirements.
arXiv Detail & Related papers (2025-05-29T09:50:56Z)
LEMUR Neural Network Dataset: Towards Seamless AutoML [35.57280723615144]
We introduce LEMUR, an open-source dataset and framework that provides a large collection of PyTorch-based neural networks.<n>Each model follows a unified template, with configurations and results stored in a structured database to ensure consistency.<n>LEMUR aims to accelerate AutoML research, enable fair benchmarking, and reduce barriers to large-scale neural network research.
arXiv Detail & Related papers (2025-04-14T09:08:00Z)
Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks.<n>We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z)
ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods [56.073335779595475]
We propose ReCaLL (Relative Conditional Log-Likelihood) to detect pretraining data by leveraging conditional language modeling capabilities.<n>Our empirical findings show that conditioning member data on non-member prefixes induces a larger decrease in log-likelihood compared to non-member data.<n>We conduct comprehensive experiments and show that ReCaLL achieves state-of-the-art performance on the WikiMIA dataset.
arXiv Detail & Related papers (2024-06-23T00:23:13Z)
Novel Approach to Intrusion Detection: Introducing GAN-MSCNN-BILSTM with LIME Predictions [0.0]
This paper introduces an innovative intrusion detection system that harnesses Generative Adversarial Networks (GANs), Multi-Scale Convolutional Neural Networks (MSCNNs), and Bidirectional Long Short-Term Memory (BiLSTM) networks. The system generates realistic network traffic data, encompassing both normal and attack patterns. Evaluation on the Hogzilla dataset, a standard benchmark, showcases an impressive accuracy of 99.16% for multi-class classification and 99.10% for binary classification.
arXiv Detail & Related papers (2024-06-08T11:26:44Z)
Empowering HWNs with Efficient Data Labeling: A Clustered Federated Semi-Supervised Learning Approach [2.046985601687158]
Clustered Federated Multitask Learning (CFL) has gained considerable attention as an effective strategy for overcoming statistical challenges. We introduce a novel framework, Clustered Federated Semi-Supervised Learning (CFSL), designed for more realistic HWN scenarios. Our results demonstrate that CFSL significantly improves upon key metrics such as testing accuracy, labeling accuracy, and labeling latency under varying proportions of labeled and unlabeled data.
arXiv Detail & Related papers (2024-01-19T11:47:49Z)
CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE) At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales. We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z)
Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks [1.2575897140677708]
Flow-based data sets are crucial to increase the performance of Machine Learning components. Data privacy is appearing more and more as a strong requirement when processing such network data. We propose a novel deterministic way to measure the quality of the synthetic data produced by a GAN.
arXiv Detail & Related papers (2021-07-30T17:27:55Z)
BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift. We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions. Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.