PicoDomain: A Compact High-Fidelity Cybersecurity Dataset
- URL: http://arxiv.org/abs/2008.09192v1
- Date: Thu, 20 Aug 2020 20:18:04 GMT
- Title: PicoDomain: A Compact High-Fidelity Cybersecurity Dataset
- Authors: Craig Laprade, Benjamin Bowman, H. Howie Huang
- Abstract summary: Current cybersecurity datasets either offer no ground truth or do so with anonymized data.
Most existing datasets are large enough to make them unwieldy during prototype development.
In this paper we have developed the PicoDomain dataset, a compact high-fidelity collection of Zeek logs from a realistic intrusion.
- Score: 0.9281671380673305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Analysis of cyber relevant data has become an area of increasing focus. As
larger percentages of businesses and governments begin to understand the
implications of cyberattacks, the impetus for better cybersecurity solutions
has increased. Unfortunately, current cybersecurity datasets either offer no
ground truth or do so with anonymized data. The former leads to a quandary when
verifying results and the latter can remove valuable information. Additionally,
most existing datasets are large enough to make them unwieldy during prototype
development. In this paper we have developed the PicoDomain dataset, a compact
high-fidelity collection of Zeek logs from a realistic intrusion using relevant
Tools, Techniques, and Procedures. While simulated on a small-scale network,
this dataset consists of traffic typical of an enterprise network, which can be
utilized for rapid validation and iterative development of analytics platforms.
We have validated this dataset using traditional statistical analysis and
off-the-shelf Machine Learning techniques.
Related papers
- KiNETGAN: Enabling Distributed Network Intrusion Detection through Knowledge-Infused Synthetic Data Generation [0.0]
We propose a knowledge-infused Generative Adversarial Network for generating synthetic network activity data (KiNETGAN)
Our approach enhances the resilience of distributed intrusion detection while addressing privacy concerns.
arXiv Detail & Related papers (2024-05-26T08:02:02Z) - Computationally and Memory-Efficient Robust Predictive Analytics Using Big Data [0.0]
This study navigates through the challenges of data uncertainties, storage limitations, and predictive data-driven modeling using big data.
We utilize Robust Principal Component Analysis (RPCA) for effective noise reduction and outlier elimination, and Optimal Sensor Placement (OSP) for efficient data compression and storage.
arXiv Detail & Related papers (2024-03-27T22:39:08Z) - On the Cross-Dataset Generalization of Machine Learning for Network
Intrusion Detection [50.38534263407915]
Network Intrusion Detection Systems (NIDS) are a fundamental tool in cybersecurity.
Their ability to generalize across diverse networks is a critical factor in their effectiveness and a prerequisite for real-world applications.
In this study, we conduct a comprehensive analysis on the generalization of machine-learning-based NIDS through an extensive experimentation in a cross-dataset framework.
arXiv Detail & Related papers (2024-02-15T14:39:58Z) - Stepping out of Flatland: Discovering Behavior Patterns as Topological Structures in Cyber Hypergraphs [0.7835894511242797]
We present a novel framework based in the theory of hypergraphs and topology to understand data from cyber networks.
We will demonstrate concrete examples in a large-scale cyber network dataset.
arXiv Detail & Related papers (2023-11-08T00:00:33Z) - TII-SSRC-23 Dataset: Typological Exploration of Diverse Traffic Patterns
for Intrusion Detection [0.5261718469769447]
Existing datasets often fall short, lacking the necessary diversity and alignment with the contemporary network environment.
This paper introduces TII-SSRC-23, a novel and comprehensive dataset designed to overcome these challenges.
arXiv Detail & Related papers (2023-09-14T05:23:36Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - Towards Generalizable Data Protection With Transferable Unlearnable
Examples [50.628011208660645]
We present a novel, generalizable data protection method by generating transferable unlearnable examples.
To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution.
arXiv Detail & Related papers (2023-05-18T04:17:01Z) - Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society.
Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities.
With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - A Review of Topological Data Analysis for Cybersecurity [1.0878040851638]
Topological Data Analysis (TDA) studies the high level structure of data using techniques from algebraic topology.
We hope to highlight to researchers a promising new area with strong potential to improve cybersecurity data science.
arXiv Detail & Related papers (2022-02-16T13:03:52Z) - Privacy-preserving Traffic Flow Prediction: A Federated Learning
Approach [61.64006416975458]
We propose a privacy-preserving machine learning technique named Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU) for traffic flow prediction.
FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism.
It is shown that FedGRU's prediction accuracy is 90.96% higher than the advanced deep learning models.
arXiv Detail & Related papers (2020-03-19T13:07:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.