Related papers: Unsupervised anomaly detection on cybersecurity data streams: a case with BETH dataset

Unsupervised anomaly detection on cybersecurity data streams: a case with BETH dataset

URL: http://arxiv.org/abs/2503.04178v1
Date: Thu, 06 Mar 2025 07:45:48 GMT
Title: Unsupervised anomaly detection on cybersecurity data streams: a case with BETH dataset
Authors: Evgeniy Eremin,
Abstract summary: Stream learning algorithms are capable of providing near-real-time data processing.<n>This article examines the results of ten algorithms from three Python stream machine-learning libraries on BETH dataset with cybersecurity events.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In modern world the importance of cybersecurity of various systems is increasing from year to year. The number of information security events generated by information security tools grows up with the development of the IT infrastructure. At the same time, the cyber threat landscape does not remain constant, and monitoring should take into account both already known attack indicators and those for which there are no signature rules in information security products of various classes yet. Detecting anomalies in large cybersecurity data streams is a complex task that, if properly addressed, can allow for timely response to atypical and previously unknown cyber threats. The possibilities of using of offline algorithms may be limited for a number of reasons related to the time of training and the frequency of retraining. Using stream learning algorithms for solving this task is capable of providing near-real-time data processing. This article examines the results of ten algorithms from three Python stream machine-learning libraries on BETH dataset with cybersecurity events, which contains information about the creation, cloning, and destruction of operating system processes collected using extended eBPF. ROC-AUC metric and total processing time of processing with these algorithms are presented. Several combinations of features and the order of events are considered. In conclusion, some mentions are given about the most promising algorithms and possible directions for further research are outlined.

Related papers

A Review of Various Datasets for Machine Learning Algorithm-Based Intrusion Detection System: Advances and Challenges [0.40964539027092917]
IDS aims to protect computer networks from security threats by detecting, notifying, and taking appropriate action to prevent illegal access and protect confidential information.<n>Researchers are enhancing the effectiveness of IDS by incorporating popular datasets into machine learning algorithms.<n>This paper explores the methods of capturing and reviewing intrusion detection systems (IDS) and evaluates the challenges existing datasets face.
arXiv Detail & Related papers (2025-06-03T04:47:21Z)
Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs [58.24692529185971]
We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
arXiv Detail & Related papers (2025-05-29T09:19:07Z)
Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society. Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities. With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z)
Ensemble learning techniques for intrusion detection system in the context of cybersecurity [0.0]
Intrusion Detection System concept was used with the application of the Data Mining and Machine Learning Orange tool to obtain better results. The main objective of the study was to investigate the Ensemble Learning technique using the Stacking method, supported by the Support Vector Machine (SVM) and kNearest Neighbour (kNN) algorithms.
arXiv Detail & Related papers (2022-12-21T10:50:54Z)
A Robust and Explainable Data-Driven Anomaly Detection Approach For Power Electronics [56.86150790999639]
We present two anomaly detection and classification approaches, namely the Matrix Profile algorithm and anomaly transformer. The Matrix Profile algorithm is shown to be well suited as a generalizable approach for detecting real-time anomalies in streaming time-series data. A series of custom filters is created and added to the detector to tune its sensitivity, recall, and detection accuracy.
arXiv Detail & Related papers (2022-09-23T06:09:35Z)
Towards Automated Classification of Attackers' TTPs by combining NLP with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research. Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z)
Robustness Evaluation of Deep Unsupervised Learning Algorithms for Intrusion Detection Systems [0.0]
This paper evaluates the robustness of six recent deep learning algorithms for intrusion detection on contaminated data. Our experiments suggest that the state-of-the-art algorithms used in this study are sensitive to data contamination and reveal the importance of self-defense against data perturbation.
arXiv Detail & Related papers (2022-06-25T02:28:39Z)
Zero Day Threat Detection Using Graph and Flow Based Security Telemetry [3.3029515721630855]
Zero Day Threats (ZDT) are novel methods used by malicious actors to attack and exploit information technology (IT) networks or infrastructure. In this paper, we introduce a deep learning based approach to Zero Day Threat detection that can generalize, scale, and effectively identify threats in near real-time.
arXiv Detail & Related papers (2022-05-04T19:30:48Z)
Multi-Source Data Fusion for Cyberattack Detection in Power Systems [1.8914160585516038]
We show that fusing information from multiple data sources can help identify cyber-induced incidents and reduce false positives. We perform multi-source data fusion for training IDS in a cyber-physical power system testbed. Results are presented using the proposed data fusion application to infer False Data and Command injection-based Man-in- The-Middle attacks.
arXiv Detail & Related papers (2021-01-18T06:34:45Z)
Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance. We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems. We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
NERD: Neural Network for Edict of Risky Data Streams [0.0]
Cyber incidents can have a wide range of cause from a simple connection loss to an insistent attack. The developed system is enriched with information by multiple sources such as intrusion detection systems and monitoring tools. It uses over twenty key attributes like sync-package ratio to identify potential security incidents and to classify the data into different priority categories.
arXiv Detail & Related papers (2020-07-08T14:24:48Z)
Data Mining with Big Data in Intrusion Detection Systems: A Systematic Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation. The rapid rate and volume of data creation has begun to pose significant challenges for data management and security. The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z)
Bias in Multimodal AI: Testbed for Fair Automatic Recruitment [73.85525896663371]
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data. We train automatic recruitment algorithms using a set of multimodal synthetic profiles consciously scored with gender and racial biases. Our methodology and results show how to generate fairer AI-based tools in general, and in particular fairer automated recruitment systems.
arXiv Detail & Related papers (2020-04-15T15:58:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.