Performance Modeling of Data Storage Systems using Generative Models
- URL: http://arxiv.org/abs/2307.02073v1
- Date: Wed, 5 Jul 2023 07:30:53 GMT
- Title: Performance Modeling of Data Storage Systems using Generative Models
- Authors: Abdalaziz Rashid Al-Maeeni, Aziz Temirkhanov, Artem Ryzhikov, Mikhail
Hushchyn
- Abstract summary: We have developed several models of a storage system using machine learning-based generative models.
The results of the experiments demonstrate the errors of 4-10 % for IOPS and 3-16 % for latency predictions depending on the components.
- Score: 0.5352699766206809
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-precision modeling of systems is one of the main areas of industrial
data analysis. Models of systems, their digital twins, are used to predict
their behavior under various conditions. We have developed several models of a
storage system using machine learning-based generative models. The system
consists of several components: hard disk drive (HDD) and solid-state drive
(SSD) storage pools with different RAID schemes and cache. Each storage
component is represented by a probabilistic model that describes the
probability distribution of the component performance in terms of IOPS and
latency, depending on their configuration and external data load parameters.
The results of the experiments demonstrate the errors of 4-10 % for IOPS and
3-16 % for latency predictions depending on the components and models of the
system. The predictions show up to 0.99 Pearson correlation with Little's law,
which can be used for unsupervised reliability checks of the models. In
addition, we present novel data sets that can be used for benchmarking
regression algorithms, conditional generative models, and uncertainty
estimation methods in machine learning.
Related papers
- Stable Training of Probabilistic Models Using the Leave-One-Out Maximum Log-Likelihood Objective [0.7373617024876725]
Kernel density estimation (KDE) based models are popular choices for this task, but they fail to adapt to data regions with varying densities.
An adaptive KDE model is employed to circumvent this, where each kernel in the model has an individual bandwidth.
A modified expectation-maximization algorithm is employed to accelerate the optimization speed reliably.
arXiv Detail & Related papers (2023-10-05T14:08:42Z) - Robustness and Generalization Performance of Deep Learning Models on
Cyber-Physical Systems: A Comparative Study [71.84852429039881]
Investigation focuses on the models' ability to handle a range of perturbations, such as sensor faults and noise.
We test the generalization and transfer learning capabilities of these models by exposing them to out-of-distribution (OOD) samples.
arXiv Detail & Related papers (2023-06-13T12:43:59Z) - A prediction and behavioural analysis of machine learning methods for
modelling travel mode choice [0.26249027950824505]
We conduct a systematic comparison of different modelling approaches, across multiple modelling problems, in terms of the key factors likely to affect model choice.
Results indicate that the models with the highest disaggregate predictive performance provide poorer estimates of behavioural indicators and aggregate mode shares.
It is also observed that the MNL model performs robustly in a variety of situations, though ML techniques can improve the estimates of behavioural indices such as Willingness to Pay.
arXiv Detail & Related papers (2023-01-11T11:10:32Z) - Device Modeling Bias in ReRAM-based Neural Network Simulations [1.5490932775843136]
Data-driven modeling approaches such as jump tables are promising to model memory devices for neural network simulations.
We study how various jump table device models impact the attained network performance estimates.
Results on a multi-layer perceptron trained on MNIST show that device models based on binning can behave unpredictably.
arXiv Detail & Related papers (2022-11-29T04:45:06Z) - HigeNet: A Highly Efficient Modeling for Long Sequence Time Series
Prediction in AIOps [30.963758935255075]
In this paper, we propose a highly efficient model named HigeNet to predict the long-time sequence time series.
We show that training time, resource usage and accuracy of the model are found to be significantly better than five state-of-the-art competing models.
arXiv Detail & Related papers (2022-11-13T13:48:43Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction.
We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z) - A Framework for Machine Learning of Model Error in Dynamical Systems [7.384376731453594]
We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from data.
We cast the problem in both continuous- and discrete-time, for problems in which the model error is memoryless and in which it has significant memory.
We find that hybrid methods substantially outperform solely data-driven approaches in terms of data hunger, demands for model complexity, and overall predictive performance.
arXiv Detail & Related papers (2021-07-14T12:47:48Z) - Using Data Assimilation to Train a Hybrid Forecast System that Combines
Machine-Learning and Knowledge-Based Components [52.77024349608834]
We consider the problem of data-assisted forecasting of chaotic dynamical systems when the available data is noisy partial measurements.
We show that by using partial measurements of the state of the dynamical system, we can train a machine learning model to improve predictions made by an imperfect knowledge-based model.
arXiv Detail & Related papers (2021-02-15T19:56:48Z) - Anomaly Detection of Time Series with Smoothness-Inducing Sequential
Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series.
Our model parameterizes mean and variance for each time-stamp with flexible neural networks.
We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z) - Superiority of Simplicity: A Lightweight Model for Network Device
Workload Prediction [58.98112070128482]
We propose a lightweight solution for series prediction based on historic observations.
It consists of a heterogeneous ensemble method composed of two models - a neural network and a mean predictor.
It achieves an overall $R2$ score of 0.10 on the available FedCSIS 2020 challenge dataset.
arXiv Detail & Related papers (2020-07-07T15:44:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.