Exploring Error Bits for Memory Failure Prediction: An In-Depth
Correlative Study
- URL: http://arxiv.org/abs/2312.02855v2
- Date: Mon, 18 Dec 2023 15:30:26 GMT
- Title: Exploring Error Bits for Memory Failure Prediction: An In-Depth
Correlative Study
- Authors: Qiao Yu, Wengui Zhang, Jorge Cardoso and Odej Kao
- Abstract summary: We present a comprehensive study on the correlation between CEs and UEs.
Our analysis reveals a strong correlation between large-temporal error bits and UE occurrence.
Our approach effectively reduces the number of virtual machine interruptions caused by UEs by approximately 59%.
- Score: 5.292618442300404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In large-scale datacenters, memory failure is a common cause of server
crashes, with Uncorrectable Errors (UEs) being a major indicator of Dual Inline
Memory Module (DIMM) defects. Existing approaches primarily focus on predicting
UEs using Correctable Errors (CEs), without fully considering the information
provided by error bits. However, error bit patterns have a strong correlation
with the occurrence of UEs. In this paper, we present a comprehensive study on
the correlation between CEs and UEs, specifically emphasizing the importance of
spatio-temporal error bit information. Our analysis reveals a strong
correlation between spatio-temporal error bits and UE occurrence. Through
evaluations using real-world datasets, we demonstrate that our approach
significantly improves prediction performance by 15% in F1-score compared to
the state-of-the-art algorithms. Overall, our approach effectively reduces the
number of virtual machine interruptions caused by UEs by approximately 59%.
Related papers
- Detrimental non-Markovian errors for surface code memory [0.5490714603843316]
We study the structure of non-Markovian correlated errors and their impact on surface code memory performance.
Our analysis shows that while not all temporally correlated structures are detrimental, certain structures, particularly multi-time "streaky" correlations, can severely degrade logical error rate scaling.
arXiv Detail & Related papers (2024-10-31T09:52:21Z) - Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks [16.064233621959538]
We propose a query-efficient and computation-efficient MIA that directly textbfRe-levertextbfAges the original membershitextbfP scores to mtextbfItigate the errors in textbfDifficulty calibration.
arXiv Detail & Related papers (2024-08-31T11:59:42Z) - Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD)
In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency.
Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z) - Investigating Memory Failure Prediction Across CPU Architectures [8.477622236186695]
We investigate the correlation between Correctable Errors (CEs) and Uncorrectable Errors (UEs) across different CPU architectures.
Our analysis identifies unique patterns of memory failure associated with each processor platform.
We conduct the memory failure prediction in different processors' platforms, achieving up to 15% improvements in F1-score compared to the existing algorithm.
arXiv Detail & Related papers (2024-06-08T05:10:23Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Tightening the Approximation Error of Adversarial Risk with Auto Loss
Function Search [12.263913626161155]
A common type of evaluation is to approximate the adversarial risk of a model as a robustness indicator.
We propose AutoLoss-AR, the first method for searching loss functions for tightening the error.
The results demonstrate the effectiveness of the proposed methods.
arXiv Detail & Related papers (2021-11-09T11:47:43Z) - Discriminative-Generative Dual Memory Video Anomaly Detection [81.09977516403411]
Recently, people tried to use a few anomalies for video anomaly detection (VAD) instead of only normal data during the training process.
We propose a DiscRiminative-gEnerative duAl Memory (DREAM) anomaly detection model to take advantage of a few anomalies and solve data imbalance.
arXiv Detail & Related papers (2021-04-29T15:49:01Z) - Collaborative Boundary-aware Context Encoding Networks for Error Map
Prediction [65.44752447868626]
We propose collaborative boundaryaware context encoding networks called AEP-Net for error prediction task.
Specifically, we propose a collaborative feature transformation branch for better feature fusion between images and masks, and precise localization of error regions.
The AEP-Net achieves an average DSC of 0.8358, 0.8164 for error prediction task, and shows a high Pearson correlation coefficient of 0.9873.
arXiv Detail & Related papers (2020-06-25T12:42:01Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.