Data Mining with Big Data in Intrusion Detection Systems: A Systematic
Literature Review
- URL: http://arxiv.org/abs/2005.12267v1
- Date: Sat, 23 May 2020 20:57:12 GMT
- Title: Data Mining with Big Data in Intrusion Detection Systems: A Systematic
Literature Review
- Authors: Fadi Salo, MohammadNoor Injadat, Ali Bou Nassif, Aleksander Essex
- Abstract summary: Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation.
The rapid rate and volume of data creation has begun to pose significant challenges for data management and security.
The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
- Score: 68.15472610671748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cloud computing has become a powerful and indispensable technology for
complex, high performance and scalable computation. The exponential expansion
in the deployment of cloud technology has produced a massive amount of data
from a variety of applications, resources and platforms. In turn, the rapid
rate and volume of data creation has begun to pose significant challenges for
data management and security. The design and deployment of intrusion detection
systems (IDS) in the big data setting has, therefore, become a topic of
importance. In this paper, we conduct a systematic literature review (SLR) of
data mining techniques (DMT) used in IDS-based solutions through the period
2013-2018. We employed criterion-based, purposive sampling identifying 32
articles, which constitute the primary source of the present survey. After a
careful investigation of these articles, we identified 17 separate DMTs
deployed in an IDS context. This paper also presents the merits and
disadvantages of the various works of current research that implemented DMTs
and distributed streaming frameworks (DSF) to detect and/or prevent malicious
attacks in a big data environment.
Related papers
- Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning [52.6706505729803]
We introduce Federated Learning (FL) to collaboratively train a decentralized shared model of Intrusion Detection Systems (IDS)
FLEKD enables a more flexible aggregation method than conventional model fusion techniques.
Experiment results show that the proposed approach outperforms local training and traditional FL in terms of both speed and performance.
arXiv Detail & Related papers (2024-01-22T14:16:37Z) - TII-SSRC-23 Dataset: Typological Exploration of Diverse Traffic Patterns
for Intrusion Detection [0.5261718469769447]
Existing datasets often fall short, lacking the necessary diversity and alignment with the contemporary network environment.
This paper introduces TII-SSRC-23, a novel and comprehensive dataset designed to overcome these challenges.
arXiv Detail & Related papers (2023-09-14T05:23:36Z) - Integration of Domain Expert-Centric Ontology Design into the CRISP-DM for Cyber-Physical Production Systems [45.05372822216111]
Methods from Machine Learning (ML) and Data Mining (DM) have proven to be promising in extracting complex and hidden patterns from the data collected.
However, such data-driven projects, usually performed with the Cross-Industry Standard Process for Data Mining (CRISPDM), often fail due to the disproportionate amount of time needed for understanding and preparing the data.
This contribution intends present an integrated approach so that data scientists are able to more quickly and reliably gain insights into the CPPS challenges.
arXiv Detail & Related papers (2023-07-21T15:04:00Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - Dataset Distillation: A Comprehensive Review [76.26276286545284]
dataset distillation (DD) aims to derive a much smaller dataset containing synthetic samples, based on which the trained models yield performance comparable with those trained on the original dataset.
This paper gives a comprehensive review and summary of recent advances in DD and its application.
arXiv Detail & Related papers (2023-01-17T17:03:28Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Bridging the gap to real-world for network intrusion detection systems
with data-centric approach [1.4699455652461724]
This paper presents a systematic data-centric approach to address the current limitations of NIDS research.
It generates NIDS datasets composed of the most recent network traffic and attacks, with the labeling process integrated by design.
arXiv Detail & Related papers (2021-10-25T04:50:12Z) - Big Machinery Data Preprocessing Methodology for Data-Driven Models in
Prognostics and Health Management [0.0]
This paper presents a comprehensive, step-by-step pipeline for the preprocessing of monitoring data from complex systems.
The importance of expert knowledge is discussed in the context of data selection and label generation.
Two case studies are presented for validation, with the end goal of creating clean data sets with healthy and unhealthy labels.
arXiv Detail & Related papers (2021-10-08T17:10:12Z) - Multi-Source Data Fusion for Cyberattack Detection in Power Systems [1.8914160585516038]
We show that fusing information from multiple data sources can help identify cyber-induced incidents and reduce false positives.
We perform multi-source data fusion for training IDS in a cyber-physical power system testbed.
Results are presented using the proposed data fusion application to infer False Data and Command injection-based Man-in- The-Middle attacks.
arXiv Detail & Related papers (2021-01-18T06:34:45Z) - Survey of Network Intrusion Detection Methods from the Perspective of
the Knowledge Discovery in Databases Process [63.75363908696257]
We review the methods that have been applied to network data with the purpose of developing an intrusion detector.
We discuss the techniques used for the capture, preparation and transformation of the data, as well as, the data mining and evaluation methods.
As a result of this literature review, we investigate some open issues which will need to be considered for further research in the area of network security.
arXiv Detail & Related papers (2020-01-27T11:21:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.