Research on Dynamic Data Flow Anomaly Detection based on Machine Learning
- URL: http://arxiv.org/abs/2409.14796v1
- Date: Mon, 23 Sep 2024 08:19:15 GMT
- Title: Research on Dynamic Data Flow Anomaly Detection based on Machine Learning
- Authors: Liyang Wang, Yu Cheng, Hao Gong, Jiacheng Hu, Xirui Tang, Iris Li,
- Abstract summary: In this study, the unsupervised learning method is employed to identify anomalies in dynamic data flows.
By clustering similar data, the model is able to detect data behaviour that deviates significantly from normal traffic without the need for labelled data.
Notably, it demonstrates robust and adaptable performance, particularly in the context of unbalanced data.
- Score: 11.526496773281938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The sophistication and diversity of contemporary cyberattacks have rendered the use of proxies, gateways, firewalls, and encrypted tunnels as a standalone defensive strategy inadequate. Consequently, the proactive identification of data anomalies has emerged as a prominent area of research within the field of data security. The majority of extant studies concentrate on sample equilibrium data, with the consequence that the detection effect is not optimal in the context of unbalanced data. In this study, the unsupervised learning method is employed to identify anomalies in dynamic data flows. Initially, multi-dimensional features are extracted from real-time data, and a clustering algorithm is utilised to analyse the patterns of the data. This enables the potential outliers to be automatically identified. By clustering similar data, the model is able to detect data behaviour that deviates significantly from normal traffic without the need for labelled data. The results of the experiments demonstrate that the proposed method exhibits high accuracy in the detection of anomalies across a range of scenarios. Notably, it demonstrates robust and adaptable performance, particularly in the context of unbalanced data.
Related papers
- Approaching Metaheuristic Deep Learning Combos for Automated Data Mining [0.5419570023862531]
This work proposes a means of combining meta-heuristic methods with conventional classifiers and neural networks in order to perform automated data mining.
Experiments on the MNIST dataset for handwritten digit recognition were performed.
It was empirically observed that using a ground truth labeled dataset's validation accuracy is inadequate for correcting labels of other previously unseen data instances.
arXiv Detail & Related papers (2024-10-16T10:28:22Z) - Anomaly Detection by Context Contrasting [57.695202846009714]
Anomaly detection focuses on identifying samples that deviate from the norm.
Recent advances in self-supervised learning have shown great promise in this regard.
We propose Con$$, which learns through context augmentations.
arXiv Detail & Related papers (2024-05-29T07:59:06Z) - DeepHYDRA: Resource-Efficient Time-Series Anomaly Detection in Dynamically-Configured Systems [3.44012349879073]
We present DeepHYDRA (Deep Hybrid DBSCAN/Reduction-Based Anomaly Detection)
It combines DBSCAN and learning-based anomaly detection.
It is shown to reliably detect different types of anomalies in both large and complex datasets.
arXiv Detail & Related papers (2024-05-13T13:47:15Z) - WePaMaDM-Outlier Detection: Weighted Outlier Detection using Pattern
Approaches for Mass Data Mining [0.6754597324022876]
Outlier detection can reveal vital information about system faults, fraudulent activities, and patterns in the data.
This article proposed the WePaMaDM-Outlier Detection with distinct mass data mining domain.
It also investigates the significance of data modeling in outlier detection techniques in surveillance, fault detection, and trend analysis.
arXiv Detail & Related papers (2023-06-09T07:00:00Z) - Fast kernel methods for Data Quality Monitoring as a goodness-of-fit
test [10.882743697472755]
We propose a machine learning approach for monitoring particle detectors in real-time.
The goal is to assess the compatibility of incoming experimental data with a reference dataset, characterising the data behaviour under normal circumstances.
The model is based on a modern implementation of kernel methods, nonparametric algorithms that can learn any continuous function given enough data.
arXiv Detail & Related papers (2023-03-09T16:59:35Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - Categorical anomaly detection in heterogeneous data using minimum
description length clustering [3.871148938060281]
We propose a meta-algorithm for enhancing any MDL-based anomaly detection model to deal with heterogeneous data.
Our experimental results show that using a discrete mixture model provides competitive performance relative to two previous anomaly detection algorithms.
arXiv Detail & Related papers (2020-06-14T14:48:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.