DRAM Failure Prediction in AIOps: Empirical Evaluation, Challenges and
Opportunities
- URL: http://arxiv.org/abs/2104.15052v2
- Date: Tue, 4 May 2021 02:59:45 GMT
- Title: DRAM Failure Prediction in AIOps: Empirical Evaluation, Challenges and
Opportunities
- Authors: Zhiyue Wu, Hongzuo Xu, Guansong Pang, Fengyuan Yu, Yijie Wang, Songlei
Jian, Yongjun Wang
- Abstract summary: This paper presents a comprehensive empirical evaluation of diverse machine learning techniques for DRAM failure prediction.
We first formulate the problem as a multi-class classification task and exhaustively evaluate seven popular/state-of-the-art classifiers on both the individual and multiple data sources.
We then formulate the problem as an unsupervised anomaly detection task and evaluate three state-of-the-art anomaly detectors.
- Score: 17.21846133804582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: DRAM failure prediction is a vital task in AIOps, which is crucial to
maintain the reliability and sustainable service of large-scale data centers.
However, limited work has been done on DRAM failure prediction mainly due to
the lack of public available datasets. This paper presents a comprehensive
empirical evaluation of diverse machine learning techniques for DRAM failure
prediction using a large-scale multi-source dataset, including more than three
millions of records of kernel, address, and mcelog data, provided by Alibaba
Cloud through PAKDD 2021 competition. Particularly, we first formulate the
problem as a multi-class classification task and exhaustively evaluate seven
popular/state-of-the-art classifiers on both the individual and multiple data
sources. We then formulate the problem as an unsupervised anomaly detection
task and evaluate three state-of-the-art anomaly detectors. Further, based on
the empirical results and our experience of attending this competition, we
discuss major challenges and present future research opportunities in this
task.
Related papers
- Early Detection of At-Risk Students Using Machine Learning [0.0]
We aim to tackle the persistent challenges of higher education retention and student dropout rates by screening for at-risk students.
This work considers several machine learning models, including Support Vector Machines (SVM), Naive Bayes, K-nearest neighbors (KNN), Decision Trees, Logistic Regression, and Random Forest.
Our analysis indicates that all algorithms generate an acceptable outcome for at-risk student predictions, while Naive Bayes performs best overall.
arXiv Detail & Related papers (2024-12-12T17:33:06Z) - See it, Think it, Sorted: Large Multimodal Models are Few-shot Time Series Anomaly Analyzers [23.701716999879636]
Time series anomaly detection (TSAD) is becoming increasingly vital due to the rapid growth of time series data.
We introduce a pioneering framework called the Time Series Anomaly Multimodal Analyzer (TAMA) to enhance both the detection and interpretation of anomalies.
arXiv Detail & Related papers (2024-11-04T10:28:41Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Unraveling the "Anomaly" in Time Series Anomaly Detection: A
Self-supervised Tri-domain Solution [89.16750999704969]
Anomaly labels hinder traditional supervised models in time series anomaly detection.
Various SOTA deep learning techniques, such as self-supervised learning, have been introduced to tackle this issue.
We propose a novel self-supervised learning based Tri-domain Anomaly Detector (TriAD)
arXiv Detail & Related papers (2023-11-19T05:37:18Z) - Wild Face Anti-Spoofing Challenge 2023: Benchmark and Results [73.98594459933008]
Face anti-spoofing (FAS) is an essential mechanism for safeguarding the integrity of automated face recognition systems.
This limitation can be attributed to the scarcity and lack of diversity in publicly available FAS datasets.
We introduce the Wild Face Anti-Spoofing dataset, a large-scale, diverse FAS dataset collected in unconstrained settings.
arXiv Detail & Related papers (2023-04-12T10:29:42Z) - A Comprehensive Review of Trends, Applications and Challenges In
Out-of-Distribution Detection [0.76146285961466]
Field of study has emerged, focusing on detecting out-of-distribution data subsets and enabling a more comprehensive generalization.
As many deep learning based models have achieved near-perfect results on benchmark datasets, the need to evaluate these models' reliability and trustworthiness is felt more strongly than ever.
This paper presents a survey that, in addition to reviewing more than 70 papers in this field, presents challenges and directions for future works and offers a unifying look into various types of data shifts and solutions for better generalization.
arXiv Detail & Related papers (2022-09-26T18:13:14Z) - Data-Centric Epidemic Forecasting: A Survey [56.99209141838794]
This survey delves into various data-driven methodological and practical advancements.
We enumerate the large number of epidemiological datasets and novel data streams that are relevant to epidemic forecasting.
We also discuss experiences and challenges that arise in real-world deployment of these forecasting systems.
arXiv Detail & Related papers (2022-07-19T16:15:11Z) - Predicting Seriousness of Injury in a Traffic Accident: A New Imbalanced
Dataset and Benchmark [62.997667081978825]
The paper introduces a new dataset to assess the performance of machine learning algorithms in the prediction of the seriousness of injury in a traffic accident.
The dataset is created by aggregating publicly available datasets from the UK Department for Transport.
arXiv Detail & Related papers (2022-05-20T21:15:26Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Exathlon: A Benchmark for Explainable Anomaly Detection over Time Series [6.085662888748731]
We present Exathlon, the first benchmark for explainable anomaly detection over high-dimensional time series data.
Exathlon has been constructed based on real data traces from repeated executions of large-scale stream processing jobs on an Apache Spark cluster.
For each of the anomaly instances, ground truth labels for the root cause interval as well as those for the extended effect interval are provided.
arXiv Detail & Related papers (2020-10-10T19:31:22Z) - Event Prediction in the Big Data Era: A Systematic Survey [7.3810864598379755]
Event prediction is becoming a viable option in the big data era.
This paper aims to provide a systematic and comprehensive survey of the technologies, applications, and evaluations of event prediction.
arXiv Detail & Related papers (2020-07-19T23:24:52Z) - Multi-label Prediction in Time Series Data using Deep Neural Networks [19.950094635430048]
This paper addresses a multi-label predictive fault classification problem for multidimensional time-series data.
The proposed algorithm is tested on two public benchmark datasets.
arXiv Detail & Related papers (2020-01-27T21:35:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.