Synthetic flow-based cryptomining attack generation through Generative
Adversarial Networks
- URL: http://arxiv.org/abs/2107.14776v1
- Date: Fri, 30 Jul 2021 17:27:55 GMT
- Title: Synthetic flow-based cryptomining attack generation through Generative
Adversarial Networks
- Authors: Alberto Mozo, \'Angel Gonz\'alez-Prieto, Antonio Pastor, Sandra
G\'omez-Canaval, Edgar Talavera
- Abstract summary: Flow-based data sets are crucial to increase the performance of Machine Learning components.
Data privacy is appearing more and more as a strong requirement when processing such network data.
We propose a novel deterministic way to measure the quality of the synthetic data produced by a GAN.
- Score: 1.2575897140677708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the growing rise of cyber attacks in the Internet, flow-based data
sets are crucial to increase the performance of the Machine Learning (ML)
components that run in network-based intrusion detection systems (IDS). To
overcome the existing network traffic data shortage in attack analysis, recent
works propose Generative Adversarial Networks (GANs) for synthetic flow-based
network traffic generation. Data privacy is appearing more and more as a strong
requirement when processing such network data, which suggests to find solutions
where synthetic data can fully replace real data. Because of the
ill-convergence of the GAN training, none of the existing solutions can
generate high-quality fully synthetic data that can totally substitute real
data in the training of IDS ML components. Therefore, they mix real with
synthetic data, which acts only as data augmentation components, leading to
privacy breaches as real data is used. In sharp contrast, in this work we
propose a novel deterministic way to measure the quality of the synthetic data
produced by a GAN both with respect to the real data and to its performance
when used for ML tasks. As a byproduct, we present a heuristic that uses these
metrics for selecting the best performing generator during GAN training,
leading to a stopping criterion. An additional heuristic is proposed to select
the best performing GANs when different types of synthetic data are to be used
in the same ML task. We demonstrate the adequacy of our proposal by generating
synthetic cryptomining attack traffic and normal traffic flow-based data using
an enhanced version of a Wasserstein GAN. We show that the generated synthetic
network traffic can completely replace real data when training a ML-based
cryptomining detector, obtaining similar performance and avoiding privacy
violations, since real data is not used in the training of the ML-based
detector.
Related papers
- Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification [7.357494019212501]
We propose efficient weighted-loss approaches to align synthetic data with real-world distribution.
We empirically assessed the effectiveness of our method on multiple text classification tasks.
arXiv Detail & Related papers (2024-10-28T20:53:49Z) - Automatic AI Model Selection for Wireless Systems: Online Learning via Digital Twinning [50.332027356848094]
AI-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control.
The mapping between context and AI model parameters is ideally done in a zero-shot fashion.
This paper introduces a general methodology for the online optimization of AMS mappings.
arXiv Detail & Related papers (2024-06-22T11:17:50Z) - FLIGAN: Enhancing Federated Learning with Incomplete Data using GAN [1.5749416770494706]
Federated Learning (FL) provides a privacy-preserving mechanism for distributed training of machine learning models on networked devices.
We propose FLIGAN, a novel approach to address the issue of data incompleteness in FL.
Our methodology adheres to FL's privacy requirements by generating synthetic data in a federated manner without sharing the actual data in the process.
arXiv Detail & Related papers (2024-03-25T16:49:38Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Distributed Traffic Synthesis and Classification in Edge Networks: A
Federated Self-supervised Learning Approach [83.2160310392168]
This paper proposes FS-GAN to support automatic traffic analysis and synthesis over a large number of heterogeneous datasets.
FS-GAN is composed of multiple distributed Generative Adversarial Networks (GANs)
FS-GAN can classify data of unknown types of service and create synthetic samples that capture the traffic distribution of the unknown types.
arXiv Detail & Related papers (2023-02-01T03:23:11Z) - HFedMS: Heterogeneous Federated Learning with Memorable Data Semantics
in Industrial Metaverse [49.1501082763252]
This paper presents HFEDMS for incorporating practical FL into the emerging Industrial Metaverse.
It reduces data heterogeneity through dynamic grouping and training mode conversion.
Then, it compensates for the forgotten knowledge by fusing compressed historical data semantics.
Experiments have been conducted on the streamed non-i.i.d. FEMNIST dataset using 368 simulated devices.
arXiv Detail & Related papers (2022-11-07T04:33:24Z) - A Synthetic Dataset for 5G UAV Attacks Based on Observable Network
Parameters [3.468596481227013]
This paper presents the first synthetic dataset for Unmanned Aerial Vehicle (UAV) attacks in 5G and beyond networks.
The main objective of this data is to enable deep network development for UAV communication security.
The proposed dataset provides insights into network functionality when static or moving UAV attackers target authenticated UAVs in an urban environment.
arXiv Detail & Related papers (2022-11-05T15:12:51Z) - Variational Autoencoder Generative Adversarial Network for Synthetic
Data Generation in Smart Home [15.995891934245334]
We propose a Variational AutoEncoder Geneversarative Adrial Network (VAE-GAN) as a smart grid data generative model.
VAE-GAN is capable of learning various types of data distributions and generating plausible samples from the same distribution.
Experiments indicate that the proposed synthetic data generative model outperforms the vanilla GAN network.
arXiv Detail & Related papers (2022-01-19T02:30:25Z) - Transformer Networks for Data Augmentation of Human Physical Activity
Recognition [61.303828551910634]
State of the art models like Recurrent Generative Adrial Networks (RGAN) are used to generate realistic synthetic data.
In this paper, transformer based generative adversarial networks which have global attention on data, are compared on PAMAP2 and Real World Human Activity Recognition data sets with RGAN.
arXiv Detail & Related papers (2021-09-02T16:47:29Z) - Deep convolutional generative adversarial networks for traffic data
imputation encoding time series as images [7.053891669775769]
We have developed a generative adversarial network (GAN) based traffic sensor data imputation framework (TGAN)
In this study, we have developed a novel time-dependent encoding method called the Gramian Angular Summation Field (GASF)
This study shows that the proposed model can significantly improve the traffic data imputation accuracy in terms of Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) compared to state-of-the-art models on the benchmark dataset.
arXiv Detail & Related papers (2020-05-05T19:14:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.