Related papers: Online detection of failures generated by storage simulator

Online detection of failures generated by storage simulator

URL: http://arxiv.org/abs/2101.07100v1
Date: Mon, 18 Jan 2021 14:56:53 GMT
Title: Online detection of failures generated by storage simulator
Authors: Kenenbek Arzymatov, Mikhail Hushchyn, Andrey Sapronov, Vladislav Belavin, Leonid Gremyachikh, Maksim Karpov and Andrey Ustyuzhanin
Abstract summary: We create a Go-based (golang) package for simulating the behavior of modern storage infrastructure. The package's flexible structure allows us to create a model of a real-world storage system with a number of components. To discover failures in the time series distribution generated by the simulator, we modified a change point detection algorithm that works in online mode.
Score: 2.3859858429583665
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern large-scale data-farms consist of hundreds of thousands of storage devices that span distributed infrastructure. Devices used in modern data centers (such as controllers, links, SSD- and HDD-disks) can fail due to hardware as well as software problems. Such failures or anomalies can be detected by monitoring the activity of components using machine learning techniques. In order to use these techniques, researchers need plenty of historical data of devices in normal and failure mode for training algorithms. In this work, we challenge two problems: 1) lack of storage data in the methods above by creating a simulator and 2) applying existing online algorithms that can faster detect a failure occurred in one of the components. We created a Go-based (golang) package for simulating the behavior of modern storage infrastructure. The software is based on the discrete-event modeling paradigm and captures the structure and dynamics of high-level storage system building blocks. The package's flexible structure allows us to create a model of a real-world storage system with a configurable number of components. The primary area of interest is exploring the storage machine's behavior under stress testing or exploitation in the medium- or long-term for observing failures of its components. To discover failures in the time series distribution generated by the simulator, we modified a change point detection algorithm that works in online mode. The goal of the change-point detection is to discover differences in time series distribution. This work describes an approach for failure detection in time series data based on direct density ratio estimation via binary classifiers.

Related papers

Structure-based Anomaly Detection and Clustering [1.450405446885067]
Anomaly detection is a fundamental problem in domains such as healthcare, manufacturing, and cybersecurity.<n>This thesis proposes new unsupervised methods for anomaly detection in both structured and streaming data settings.
arXiv Detail & Related papers (2025-05-19T06:20:00Z)
The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology [10.81691411087626]
In some fields of AI, machine learning and statistics, the validation of new methods and algorithms is often hindered by the scarcity of suitable real-world datasets. We have constructed two devices that allow us to quickly and inexpensively produce large datasets from non-trivial but well-understood physical systems.
arXiv Detail & Related papers (2024-04-17T13:00:52Z)
DTAAD: Dual Tcn-Attention Networks for Anomaly Detection in Multivariate Time Series Data [0.0]
We propose an anomaly detection and diagnosis model, DTAAD, based on Transformer and Dual Temporal Convolutional Network (TCN) scaling methods and feedback mechanisms are introduced to improve prediction accuracy and expand correlation differences. Our experiments on seven public datasets validate that DTAAD exceeds the majority of currently advanced baseline methods in both detection and diagnostic performance.
arXiv Detail & Related papers (2023-02-17T06:59:45Z)
A Robust and Explainable Data-Driven Anomaly Detection Approach For Power Electronics [56.86150790999639]
We present two anomaly detection and classification approaches, namely the Matrix Profile algorithm and anomaly transformer. The Matrix Profile algorithm is shown to be well suited as a generalizable approach for detecting real-time anomalies in streaming time-series data. A series of custom filters is created and added to the detector to tune its sensitivity, recall, and detection accuracy.
arXiv Detail & Related papers (2022-09-23T06:09:35Z)
Online Self-Evolving Anomaly Detection in Cloud Computing Environments [6.480575492140354]
We present a emphself-evolving anomaly detection (SEAD) framework for cloud dependability assurance. Our framework self-evolves by exploring newly verified anomaly records and continuously updating the anomaly detector online. Our detectors can achieve 88.94% in sensitivity and 94.60% on average, which makes them suitable for real-world deployment.
arXiv Detail & Related papers (2021-11-16T05:13:38Z)
DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE) It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase. Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z)
TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services [77.454688257702]
Machine learning (ML) and artificial intelligence (AI) are applied on IT system operation and maintenance. One direction aims at the recognition of re-occurring anomaly types to enable remediation automation. We propose a method that is invariant to dimensionality changes of given data.
arXiv Detail & Related papers (2021-02-25T14:24:49Z)
Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users. We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z)
TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs) To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics. To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z)
Frequency-based Multi Task learning With Attention Mechanism for Fault Detection In Power Systems [6.4332733596587115]
We introduce a novel deep learning-based approach for fault detection and test it on a real data set, namely, the Kaggle platform for a partial discharge detection task. Our solution adopts a Long-Short Term Memory architecture with attention mechanism to extract time series features, and uses a 1D-Convolutional Neural Network structure to exploit frequency information of the signal for prediction.
arXiv Detail & Related papers (2020-09-15T02:01:47Z)
Binary DAD-Net: Binarized Driveable Area Detection Network for Autonomous Driving [94.40107679615618]
This paper proposes a novel binarized driveable area detection network (binary DAD-Net) It uses only binary weights and activations in the encoder, the bottleneck, and the decoder part. It outperforms state-of-the-art semantic segmentation networks on public datasets.
arXiv Detail & Related papers (2020-06-15T07:09:01Z)
An Intelligent and Time-Efficient DDoS Identification Framework for Real-Time Enterprise Networks SAD-F: Spark Based Anomaly Detection Framework [0.5811502603310248]
We will be exploring security analytic techniques for DDoS anomaly detection using different machine learning techniques. In this paper, we are proposing a novel approach which deals with real traffic as input to the system. We study and compare the performance factor of our proposed framework on three different testbeds.
arXiv Detail & Related papers (2020-01-21T06:05:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.