Online detection of failures generated by storage simulator
- URL: http://arxiv.org/abs/2101.07100v1
- Date: Mon, 18 Jan 2021 14:56:53 GMT
- Title: Online detection of failures generated by storage simulator
- Authors: Kenenbek Arzymatov, Mikhail Hushchyn, Andrey Sapronov, Vladislav
Belavin, Leonid Gremyachikh, Maksim Karpov and Andrey Ustyuzhanin
- Abstract summary: We create a Go-based (golang) package for simulating the behavior of modern storage infrastructure.
The package's flexible structure allows us to create a model of a real-world storage system with a number of components.
To discover failures in the time series distribution generated by the simulator, we modified a change point detection algorithm that works in online mode.
- Score: 2.3859858429583665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern large-scale data-farms consist of hundreds of thousands of storage
devices that span distributed infrastructure. Devices used in modern data
centers (such as controllers, links, SSD- and HDD-disks) can fail due to
hardware as well as software problems. Such failures or anomalies can be
detected by monitoring the activity of components using machine learning
techniques. In order to use these techniques, researchers need plenty of
historical data of devices in normal and failure mode for training algorithms.
In this work, we challenge two problems: 1) lack of storage data in the methods
above by creating a simulator and 2) applying existing online algorithms that
can faster detect a failure occurred in one of the components.
We created a Go-based (golang) package for simulating the behavior of modern
storage infrastructure. The software is based on the discrete-event modeling
paradigm and captures the structure and dynamics of high-level storage system
building blocks. The package's flexible structure allows us to create a model
of a real-world storage system with a configurable number of components. The
primary area of interest is exploring the storage machine's behavior under
stress testing or exploitation in the medium- or long-term for observing
failures of its components.
To discover failures in the time series distribution generated by the
simulator, we modified a change point detection algorithm that works in online
mode. The goal of the change-point detection is to discover differences in time
series distribution. This work describes an approach for failure detection in
time series data based on direct density ratio estimation via binary
classifiers.
Related papers
- The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology [10.81691411087626]
In some fields of AI, machine learning and statistics, the validation of new methods and algorithms is often hindered by the scarcity of suitable real-world datasets.
We have constructed two devices that allow us to quickly and inexpensively produce large datasets from non-trivial but well-understood physical systems.
arXiv Detail & Related papers (2024-04-17T13:00:52Z) - DTAAD: Dual Tcn-Attention Networks for Anomaly Detection in Multivariate Time Series Data [0.0]
We propose an anomaly detection and diagnosis model, DTAAD, based on Transformer and Dual Temporal Convolutional Network (TCN)
scaling methods and feedback mechanisms are introduced to improve prediction accuracy and expand correlation differences.
Our experiments on seven public datasets validate that DTAAD exceeds the majority of currently advanced baseline methods in both detection and diagnostic performance.
arXiv Detail & Related papers (2023-02-17T06:59:45Z) - A Robust and Explainable Data-Driven Anomaly Detection Approach For
Power Electronics [56.86150790999639]
We present two anomaly detection and classification approaches, namely the Matrix Profile algorithm and anomaly transformer.
The Matrix Profile algorithm is shown to be well suited as a generalizable approach for detecting real-time anomalies in streaming time-series data.
A series of custom filters is created and added to the detector to tune its sensitivity, recall, and detection accuracy.
arXiv Detail & Related papers (2022-09-23T06:09:35Z) - Online Self-Evolving Anomaly Detection in Cloud Computing Environments [6.480575492140354]
We present a emphself-evolving anomaly detection (SEAD) framework for cloud dependability assurance.
Our framework self-evolves by exploring newly verified anomaly records and continuously updating the anomaly detector online.
Our detectors can achieve 88.94% in sensitivity and 94.60% on average, which makes them suitable for real-world deployment.
arXiv Detail & Related papers (2021-11-16T05:13:38Z) - DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly
detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE)
It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase.
Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z) - TELESTO: A Graph Neural Network Model for Anomaly Classification in
Cloud Services [77.454688257702]
Machine learning (ML) and artificial intelligence (AI) are applied on IT system operation and maintenance.
One direction aims at the recognition of re-occurring anomaly types to enable remediation automation.
We propose a method that is invariant to dimensionality changes of given data.
arXiv Detail & Related papers (2021-02-25T14:24:49Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z) - Frequency-based Multi Task learning With Attention Mechanism for Fault
Detection In Power Systems [6.4332733596587115]
We introduce a novel deep learning-based approach for fault detection and test it on a real data set, namely, the Kaggle platform for a partial discharge detection task.
Our solution adopts a Long-Short Term Memory architecture with attention mechanism to extract time series features, and uses a 1D-Convolutional Neural Network structure to exploit frequency information of the signal for prediction.
arXiv Detail & Related papers (2020-09-15T02:01:47Z) - Binary DAD-Net: Binarized Driveable Area Detection Network for
Autonomous Driving [94.40107679615618]
This paper proposes a novel binarized driveable area detection network (binary DAD-Net)
It uses only binary weights and activations in the encoder, the bottleneck, and the decoder part.
It outperforms state-of-the-art semantic segmentation networks on public datasets.
arXiv Detail & Related papers (2020-06-15T07:09:01Z) - An Intelligent and Time-Efficient DDoS Identification Framework for
Real-Time Enterprise Networks SAD-F: Spark Based Anomaly Detection Framework [0.5811502603310248]
We will be exploring security analytic techniques for DDoS anomaly detection using different machine learning techniques.
In this paper, we are proposing a novel approach which deals with real traffic as input to the system.
We study and compare the performance factor of our proposed framework on three different testbeds.
arXiv Detail & Related papers (2020-01-21T06:05:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.