DeepFT: Fault-Tolerant Edge Computing using a Self-Supervised Deep
Surrogate Model
- URL: http://arxiv.org/abs/2212.01302v1
- Date: Fri, 2 Dec 2022 16:51:58 GMT
- Title: DeepFT: Fault-Tolerant Edge Computing using a Self-Supervised Deep
Surrogate Model
- Authors: Shreshth Tuli and Giuliano Casale and Ludmila Cherkasova and Nicholas
R. Jennings
- Abstract summary: We propose DeepFT to proactively avoid system overloads and their adverse effects.
DeepFT uses a deep surrogate model to accurately predict and diagnose faults in the system.
It offers a highly scalable solution as the model size scales by only 3 and 1 percent per unit increase in the number of active tasks and hosts.
- Score: 12.335763358698564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emergence of latency-critical AI applications has been supported by the
evolution of the edge computing paradigm. However, edge solutions are typically
resource-constrained, posing reliability challenges due to heightened
contention for compute and communication capacities and faulty application
behavior in the presence of overload conditions. Although a large amount of
generated log data can be mined for fault prediction, labeling this data for
training is a manual process and thus a limiting factor for automation. Due to
this, many companies resort to unsupervised fault-tolerance models. Yet,
failure models of this kind can incur a loss of accuracy when they need to
adapt to non-stationary workloads and diverse host characteristics. To cope
with this, we propose a novel modeling approach, called DeepFT, to proactively
avoid system overloads and their adverse effects by optimizing the task
scheduling and migration decisions. DeepFT uses a deep surrogate model to
accurately predict and diagnose faults in the system and co-simulation based
self-supervised learning to dynamically adapt the model in volatile settings.
It offers a highly scalable solution as the model size scales by only 3 and 1
percent per unit increase in the number of active tasks and hosts. Extensive
experimentation on a Raspberry-Pi based edge cluster with DeFog benchmarks
shows that DeepFT can outperform state-of-the-art baseline methods in
fault-detection and QoS metrics. Specifically, DeepFT gives the highest F1
scores for fault-detection, reducing service deadline violations by up to 37\%
while also improving response time by up to 9%.
Related papers
- Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications.
Deep learning enabled anomaly detection has emerged as a critical direction.
The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z) - Three-Stage Adjusted Regression Forecasting (TSARF) for Software Defect
Prediction [5.826476252191368]
Nonhomogeneous Poisson process (NHPP) SRGM are the most commonly employed models.
Increased model complexity presents a challenge in identifying robust and computationally efficient algorithms.
arXiv Detail & Related papers (2024-01-31T02:19:35Z) - EdgeFD: An Edge-Friendly Drift-Aware Fault Diagnosis System for
Industrial IoT [0.0]
We propose the Drift-Aware Weight Consolidation (DAWC) to mitigate the challenges posed by frequent data drift in the industrial Internet of Things (IIoT)
DAWC efficiently manages multiple data drift scenarios, minimizing the need for constant model fine-tuning on edge devices.
We have also developed a comprehensive diagnosis and visualization platform.
arXiv Detail & Related papers (2023-10-07T06:48:07Z) - Learning Sample Difficulty from Pre-trained Models for Reliable
Prediction [55.77136037458667]
We propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization.
We simultaneously improve accuracy and uncertainty calibration across challenging benchmarks.
arXiv Detail & Related papers (2023-04-20T07:29:23Z) - Deep Convolutional Architectures for Extrapolative Forecast in
Time-dependent Flow Problems [0.0]
Deep learning techniques are employed to model the system dynamics for advection dominated problems.
These models take as input a sequence of high-fidelity vector solutions for consecutive time-steps obtained from the PDEs.
Non-intrusive reduced-order modelling techniques such as deep auto-encoder networks are utilized to compress the high-fidelity snapshots.
arXiv Detail & Related papers (2022-09-18T03:45:56Z) - Fast and Accurate Error Simulation for CNNs against Soft Errors [64.54260986994163]
We present a framework for the reliability analysis of Conal Neural Networks (CNNs) via an error simulation engine.
These error models are defined based on the corruption patterns of the output of the CNN operators induced by faults.
We show that our methodology achieves about 99% accuracy of the fault effects w.r.t. SASSIFI, and a speedup ranging from 44x up to 63x w.r.t.FI, that only implements a limited set of error models.
arXiv Detail & Related papers (2022-06-04T19:45:02Z) - On Efficient Uncertainty Estimation for Resource-Constrained Mobile
Applications [0.0]
Predictive uncertainty supplements model predictions and enables improved functionality of downstream tasks.
We tackle this problem by building upon Monte Carlo Dropout (MCDO) models using the Axolotl framework.
We conduct experiments on (1) a multi-class classification task using the CIFAR10 dataset, and (2) a more complex human body segmentation task.
arXiv Detail & Related papers (2021-11-11T22:24:15Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.