Using Data Redundancy Techniques to Detect and Correct Errors in Logical Data
- URL: http://arxiv.org/abs/2503.15881v1
- Date: Thu, 20 Mar 2025 06:07:13 GMT
- Title: Using Data Redundancy Techniques to Detect and Correct Errors in Logical Data
- Authors: Ahmed Sharuvan, Ahmed Naufal Abdul Hadee,
- Abstract summary: We study the RAID scheme used with disk arrays and adapt it for use with logical data.<n>We demonstrate robust performance in recovering arbitrary faults in large archive files only using a small fraction of redundant data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data redundancy techniques have been tested in several different applications to provide fault tolerance and performance gains. The use of these techniques is mostly seen at the hardware, device driver, or file system level. In practice, the use of data integrity techniques with logical data has largely been limited to verifying the integrity of transferred files using cryptographic hashes. In this paper, we study the RAID scheme used with disk arrays and adapt it for use with logical data. An implementation for such a system is devised in theory and implemented in software, providing the specifications for the procedures and file formats used. Rigorous experimentation is conducted to test the effectiveness of the developed system for multiple use cases. With computer-generated benchmarks and simulated experiments, the system demonstrates robust performance in recovering arbitrary faults in large archive files only using a small fraction of redundant data. This was achieved by leveraging computing power for the process of data recovery.
Related papers
- A Comprehensive Quantification of Inconsistencies in Memory Dumps [13.796554685139855]
We develop a system to track all write operations performed by the OS kernel during a memory acquisition process.
We quantify how different acquisition modes, file systems, and hardware targets influence the frequency of kernel writes during the dump.
arXiv Detail & Related papers (2025-03-19T10:02:54Z) - Accelerated Methods with Compressed Communications for Distributed Optimization Problems under Data Similarity [55.03958223190181]
We propose the first theoretically grounded accelerated algorithms utilizing unbiased and biased compression under data similarity.<n>Our results are of record and confirmed by experiments on different average losses and datasets.
arXiv Detail & Related papers (2024-12-21T00:40:58Z) - Ensemble Method for System Failure Detection Using Large-Scale Telemetry Data [0.0]
This research paper presents an in-depth analysis of extensive system telemetry data, proposing an ensemble methodology for detecting system failures.
The proposed ensemble technique integrates a diverse set of algorithms, including Long Short-Term Memory (LSTM) networks, isolation forests, one-class support vector machines (OCSVM), and local outlier factors (LOF)
Experimental evaluations demonstrate the remarkable efficacy of our models, achieving a notable detection rate in identifying system failures.
arXiv Detail & Related papers (2024-06-07T06:35:17Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Cooperative Hardware-Prompt Learning for Snapshot Compressive Imaging [51.65127848056702]
We propose a Federated Hardware-Prompt learning (FedHP) framework to cooperatively optimize snapshot compressive imaging systems.<n>FedHP learns a hardware-conditioned prompter to align inconsistent data distribution across clients, serving as an indicator of the data inconsistency among different hardware.<n>Experiments demonstrate that the proposed FedHP coordinates the pre-trained model to multiple hardware configurations, outperforming prevalent FL frameworks for 0.35dB.
arXiv Detail & Related papers (2023-06-01T22:21:28Z) - Large-scale End-of-Life Prediction of Hard Disks in Distributed
Datacenters [0.0]
Large-scale predictive analyses are performed using severely skewed health statistics data.
We present an encoder-decoder LSTM model where the context gained from understanding health statistics sequences aid in predicting an output sequence of the number of days remaining before a disk potentially fails.
arXiv Detail & Related papers (2023-03-15T21:55:07Z) - Block size estimation for data partitioning in HPC applications using
machine learning techniques [38.063905789566746]
This paper describes a methodology, namely BLEST-ML (BLock size ESTimation through Machine Learning), for block size estimation.
The proposed methodology was evaluated by designing an implementation tailored to dislib, a distributed computing library.
The results we obtained show the ability of BLEST-ML to efficiently determine a suitable way to split a given dataset.
arXiv Detail & Related papers (2022-11-19T23:04:14Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Online detection of failures generated by storage simulator [2.3859858429583665]
We create a Go-based (golang) package for simulating the behavior of modern storage infrastructure.
The package's flexible structure allows us to create a model of a real-world storage system with a number of components.
To discover failures in the time series distribution generated by the simulator, we modified a change point detection algorithm that works in online mode.
arXiv Detail & Related papers (2021-01-18T14:56:53Z) - Federated Doubly Stochastic Kernel Learning for Vertically Partitioned
Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data.
We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z) - Data Mining with Big Data in Intrusion Detection Systems: A Systematic
Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation.
The rapid rate and volume of data creation has begun to pose significant challenges for data management and security.
The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.