Hybrid Deep Learning Model using SPCAGAN Augmentation for Insider Threat
Analysis
- URL: http://arxiv.org/abs/2203.02855v1
- Date: Sun, 6 Mar 2022 02:08:48 GMT
- Title: Hybrid Deep Learning Model using SPCAGAN Augmentation for Insider Threat
Analysis
- Authors: R G Gayathri, Atul Sajjanhar, Yong Xiang
- Abstract summary: Anomaly detection using deep learning requires comprehensive data, but insider threat data is not readily available due to confidentiality concerns.
We propose a linear manifold learning-based generative adversarial network, SPCAGAN, that takes input from heterogeneous data sources.
We show that our proposed approach has a lower error, is more accurate, and generates substantially superior synthetic insider threat data than previous models.
- Score: 7.576808824987132
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cyberattacks from within an organization's trusted entities are known as
insider threats. Anomaly detection using deep learning requires comprehensive
data, but insider threat data is not readily available due to confidentiality
concerns of organizations. Therefore, there arises demand to generate synthetic
data to explore enhanced approaches for threat analysis. We propose a linear
manifold learning-based generative adversarial network, SPCAGAN, that takes
input from heterogeneous data sources and adds a novel loss function to train
the generator to produce high-quality data that closely resembles the original
data distribution. Furthermore, we introduce a deep learning-based hybrid model
for insider threat analysis. We provide extensive experiments for data
synthesis, anomaly detection, adversarial robustness, and synthetic data
quality analysis using benchmark datasets. In this context, empirical
comparisons show that GAN-based oversampling is competitive with numerous
typical oversampling regimes. For synthetic data generation, our SPCAGAN model
overcame the problem of mode collapse and converged faster than previous GAN
models. Results demonstrate that our proposed approach has a lower error, is
more accurate, and generates substantially superior synthetic insider threat
data than previous models.
Related papers
- Debiasing Synthetic Data Generated by Deep Generative Models [40.165159490379146]
Deep generative models (DGMs) for synthetic data generation induce bias and imprecision in synthetic data analyses.
We propose a new strategy that targets synthetic data created by DGMs for specific data analyses.
Our approach accounts for biases, enhances convergence rates, and facilitates the calculation of estimators with easily approximated large sample variances.
arXiv Detail & Related papers (2024-11-06T19:24:34Z) - zGAN: An Outlier-focused Generative Adversarial Network For Realistic Synthetic Data Generation [0.0]
"Black swans" have posed a challenge to performance of classical machine learning models.
This article provides an overview of the zGAN model architecture developed for the purpose of generating synthetic data with outlier characteristics.
It shows promising results on realistic synthetic data generation, as well as uplift capabilities vis-a-vis model performance.
arXiv Detail & Related papers (2024-10-28T07:55:11Z) - Synthetic Data Generation in Cybersecurity: A Comparative Analysis [0.0]
GAN-based methods, particularly CTGAN and CopulaGAN, outperform non-AI and conventional AI approaches in terms of fidelity and utility.
This research contributes to the field by offering the first comparative evaluation of these methods specifically for cybersecurity network traffic data.
arXiv Detail & Related papers (2024-10-18T14:19:25Z) - Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs)
Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws.
Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z) - Synthetic Oversampling: Theory and A Practical Approach Using LLMs to Address Data Imbalance [16.047084318753377]
Imbalanced data and spurious correlations are common challenges in machine learning and data science.
Oversampling, which artificially increases the number of instances in the underrepresented classes, has been widely adopted to tackle these challenges.
We introduce OPAL, a systematic oversampling approach that leverages the capabilities of large language models to generate high-quality synthetic data for minority groups.
arXiv Detail & Related papers (2024-06-05T21:24:26Z) - Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large
Language Models by Extrapolating Errors from Small Models [69.76066070227452]
*Data Synthesis* is a promising way to train a small model with very little labeled data.
We propose *Synthesis Step by Step* (**S3**), a data synthesis framework that shrinks this distribution gap.
Our approach improves the performance of a small model by reducing the gap between the synthetic dataset and the real data.
arXiv Detail & Related papers (2023-10-20T17:14:25Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited
Data [125.7135706352493]
Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images.
Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting.
This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
arXiv Detail & Related papers (2021-11-12T18:13:45Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.