Extending Isolation Forest for Anomaly Detection in Big Data via K-Means
- URL: http://arxiv.org/abs/2104.13190v1
- Date: Tue, 27 Apr 2021 16:21:48 GMT
- Title: Extending Isolation Forest for Anomaly Detection in Big Data via K-Means
- Authors: Md Tahmid Rahman Laskar, Jimmy Huang, Vladan Smetana, Chris Stewart,
Kees Pouw, Aijun An, Stephen Chan, Lei Liu
- Abstract summary: We propose a novel unsupervised machine learning approach that combines the K-Means algorithm with the Isolation Forest for anomaly detection in industrial big data scenarios.
We utilize the Apache Spark framework to implement our proposed model which was trained in large network traffic data.
We find that our proposed system can be used for real-time anomaly detection in the industrial setup.
- Score: 8.560480662599407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Industrial Information Technology (IT) infrastructures are often vulnerable
to cyberattacks. To ensure security to the computer systems in an industrial
environment, it is required to build effective intrusion detection systems to
monitor the cyber-physical systems (e.g., computer networks) in the industry
for malicious activities. This paper aims to build such intrusion detection
systems to protect the computer networks from cyberattacks. More specifically,
we propose a novel unsupervised machine learning approach that combines the
K-Means algorithm with the Isolation Forest for anomaly detection in industrial
big data scenarios. Since our objective is to build the intrusion detection
system for the big data scenario in the industrial domain, we utilize the
Apache Spark framework to implement our proposed model which was trained in
large network traffic data (about 123 million instances of network traffic)
stored in Elasticsearch. Moreover, we evaluate our proposed model on the live
streaming data and find that our proposed system can be used for real-time
anomaly detection in the industrial setup. In addition, we address different
challenges that we face while training our model on large datasets and
explicitly describe how these issues were resolved. Based on our empirical
evaluation in different use-cases for anomaly detection in real-world network
traffic data, we observe that our proposed system is effective to detect
anomalies in big data scenarios. Finally, we evaluate our proposed model on
several academic datasets to compare with other models and find that it
provides comparable performance with other state-of-the-art approaches.
Related papers
- Enhanced Anomaly Detection in Industrial Control Systems aided by Machine Learning [2.2457306746668766]
This study investigates whether combining both network and process data can improve attack detection in ICSs environments.
Our findings suggest that integrating network traffic with operational process data can enhance detection capabilities.
Although the results are promising, they are preliminary and highlight the need for further studies.
arXiv Detail & Related papers (2024-10-25T17:41:33Z) - Enhancing Automata Learning with Statistical Machine Learning: A Network Security Case Study [4.2751988244805466]
In this paper, we use automata learning to derive state machines from network-traffic data.
We apply our approach to a commercial network intrusion detection system developed by our industry partner, RabbitRun Technologies.
Our approach results in an average 67.5% reduction in the number of states and transitions of the learned state machines.
arXiv Detail & Related papers (2024-05-18T02:10:41Z) - IPAD: Industrial Process Anomaly Detection Dataset [71.39058003212614]
Video anomaly detection (VAD) is a challenging task aiming to recognize anomalies in video frames.
We propose a new dataset, IPAD, specifically designed for VAD in industrial scenarios.
This dataset covers 16 different industrial devices and contains over 6 hours of both synthetic and real-world video footage.
arXiv Detail & Related papers (2024-04-23T13:38:01Z) - Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning [52.6706505729803]
We introduce Federated Learning (FL) to collaboratively train a decentralized shared model of Intrusion Detection Systems (IDS)
FLEKD enables a more flexible aggregation method than conventional model fusion techniques.
Experiment results show that the proposed approach outperforms local training and traditional FL in terms of both speed and performance.
arXiv Detail & Related papers (2024-01-22T14:16:37Z) - A Variational Autoencoder Framework for Robust, Physics-Informed
Cyberattack Recognition in Industrial Cyber-Physical Systems [2.051548207330147]
We develop a data-driven framework that can be used to detect, diagnose, and localize a type of cyberattack called covert attacks on industrial control systems.
The framework has a hybrid design that combines a variational autoencoder (VAE), a recurrent neural network (RNN), and a Deep Neural Network (DNN)
arXiv Detail & Related papers (2023-10-10T19:07:53Z) - Leveraging a Probabilistic PCA Model to Understand the Multivariate
Statistical Network Monitoring Framework for Network Security Anomaly
Detection [64.1680666036655]
We revisit anomaly detection techniques based on PCA from a probabilistic generative model point of view.
We have evaluated the mathematical model using two different datasets.
arXiv Detail & Related papers (2023-02-02T13:41:18Z) - Deep Learning based Covert Attack Identification for Industrial Control
Systems [5.299113288020827]
We develop a data-driven framework that can be used to detect, diagnose, and localize a type of cyberattack called covert attacks on smart grids.
The framework has a hybrid design that combines an autoencoder, a recurrent neural network (RNN) with a Long-Short-Term-Memory layer, and a Deep Neural Network (DNN)
arXiv Detail & Related papers (2020-09-25T17:48:43Z) - AutoOD: Automated Outlier Detection via Curiosity-guided Search and
Self-imitation Learning [72.99415402575886]
Outlier detection is an important data mining task with numerous practical applications.
We propose AutoOD, an automated outlier detection framework, which aims to search for an optimal neural network model.
Experimental results on various real-world benchmark datasets demonstrate that the deep model identified by AutoOD achieves the best performance.
arXiv Detail & Related papers (2020-06-19T18:57:51Z) - Data Mining with Big Data in Intrusion Detection Systems: A Systematic
Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation.
The rapid rate and volume of data creation has begun to pose significant challenges for data management and security.
The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z) - Survey of Network Intrusion Detection Methods from the Perspective of
the Knowledge Discovery in Databases Process [63.75363908696257]
We review the methods that have been applied to network data with the purpose of developing an intrusion detector.
We discuss the techniques used for the capture, preparation and transformation of the data, as well as, the data mining and evaluation methods.
As a result of this literature review, we investigate some open issues which will need to be considered for further research in the area of network security.
arXiv Detail & Related papers (2020-01-27T11:21:05Z) - Deep Learning-Based Intrusion Detection System for Advanced Metering
Infrastructure [0.0]
The smart grid is exposed to a wide variety of threats that could be translated into cyber-attacks.
In this paper, we develop a deep learning-based intrusion detection system to defend against cyber-attacks.
arXiv Detail & Related papers (2019-12-31T21:06:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.