Characterizing Soft-Error Resiliency in Arm's Ethos-U55 Embedded Machine Learning Accelerator
- URL: http://arxiv.org/abs/2404.09317v1
- Date: Sun, 14 Apr 2024 18:16:16 GMT
- Title: Characterizing Soft-Error Resiliency in Arm's Ethos-U55 Embedded Machine Learning Accelerator
- Authors: Abhishek Tyagi, Reiley Jeyapaul, Chuteng Zhu, Paul Whatmough, Yuhao Zhu,
- Abstract summary: We present a reliability study of Arm's Ethos-U55, an important industrial-scale NPU being utilised in embedded and IoT applications.
We perform large scale RTL-level fault injections to characterize Ethos-U55 against the Automotive Safety Integrity Level D (ASIL-D) resiliency standard.
- Score: 3.8736624857062267
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As Neural Processing Units (NPU) or accelerators are increasingly deployed in a variety of applications including safety critical applications such as autonomous vehicle, and medical imaging, it is critical to understand the fault-tolerance nature of the NPUs. We present a reliability study of Arm's Ethos-U55, an important industrial-scale NPU being utilised in embedded and IoT applications. We perform large scale RTL-level fault injections to characterize Ethos-U55 against the Automotive Safety Integrity Level D (ASIL-D) resiliency standard commonly used for safety-critical applications such as autonomous vehicles. We show that, under soft errors, all four configurations of the NPU fall short of the required level of resiliency for a variety of neural networks running on the NPU. We show that it is possible to meet the ASIL-D level resiliency without resorting to conventional strategies like Dual Core Lock Step (DCLS) that has an area overhead of 100%. We achieve so through selective protection, where hardware structures are selectively protected (e.g., duplicated, hardened) based on their sensitivity to soft errors and their silicon areas. To identify the optimal configuration that minimizes the area overhead while meeting the ASIL-D standard, the main challenge is the large search space associated with the time-consuming RTL simulation. To address this challenge, we present a statistical analysis tool that is validated against Arm silicon and that allows us to quickly navigate hundreds of billions of fault sites without exhaustive RTL fault injections. We show that by carefully duplicating a small fraction of the functional blocks and hardening the Flops in other blocks meets the ASIL-D safety standard while introducing an area overhead of only 38%.
Related papers
- Training on the Fly: On-device Self-supervised Learning aboard Nano-drones within 20 mW [52.280742520586756]
Miniaturized cyber-physical systems (CPSes) powered by tiny machine learning (TinyML), such as nano-drones, are becoming an increasingly attractive technology.
Simple electronics make these CPSes inexpensive, but strongly limit the computational, memory, and sensing resources available on board.
We present a novel on-device fine-tuning approach that relies only on the limited ultra-low power resources available aboard nano-drones.
arXiv Detail & Related papers (2024-08-06T13:11:36Z) - Real-time Threat Detection Strategies for Resource-constrained Devices [1.4815508281465273]
We present an end-to-end process designed to effectively address DNS-tunneling attacks in a router.
We demonstrate that utilizing stateless features for training the ML model, along with features chosen to be independent of the network configuration, leads to highly accurate results.
The deployment of this carefully crafted model, optimized for embedded devices across diverse environments, resulted in high DNS-tunneling attack detection with minimal latency.
arXiv Detail & Related papers (2024-03-22T10:02:54Z) - Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning [52.6706505729803]
We introduce Federated Learning (FL) to collaboratively train a decentralized shared model of Intrusion Detection Systems (IDS)
FLEKD enables a more flexible aggregation method than conventional model fusion techniques.
Experiment results show that the proposed approach outperforms local training and traditional FL in terms of both speed and performance.
arXiv Detail & Related papers (2024-01-22T14:16:37Z) - Scaling #DNN-Verification Tools with Efficient Bound Propagation and
Parallel Computing [57.49021927832259]
Deep Neural Networks (DNNs) are powerful tools that have shown extraordinary results in many scenarios.
However, their intricate designs and lack of transparency raise safety concerns when applied in real-world applications.
Formal Verification (FV) of DNNs has emerged as a valuable solution to provide provable guarantees on the safety aspect.
arXiv Detail & Related papers (2023-12-10T13:51:25Z) - Fed-LSAE: Thwarting Poisoning Attacks against Federated Cyber Threat Detection System via Autoencoder-based Latent Space Inspection [0.0]
In cybersecurity, the sensitive data along with the contextual information and high-quality labeling play an essential role.
In this paper, we investigate a novel robust aggregation method for federated learning, namely Fed-LSAE, which takes advantage of latent space representation.
The experimental results on the CIC-ToN-IoT and N-BaIoT datasets confirm the feasibility of our defensive mechanism against cutting-edge poisoning attacks.
arXiv Detail & Related papers (2023-09-20T04:14:48Z) - Special Session: Approximation and Fault Resiliency of DNN Accelerators [0.9126382223122612]
This paper explores the approximation and fault resiliency of Deep Neural Network accelerators.
We propose to use approximate (AxC) arithmetic circuits to emulate errors in hardware without performing fault injection on the DNN.
We also propose a fine-grain analysis of fault resiliency by examining fault propagation and masking in networks.
arXiv Detail & Related papers (2023-05-31T19:27:45Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - Safe RAN control: A Symbolic Reinforcement Learning Approach [62.997667081978825]
We present a Symbolic Reinforcement Learning (SRL) based architecture for safety control of Radio Access Network (RAN) applications.
We provide a purely automated procedure in which a user can specify high-level logical safety specifications for a given cellular network topology.
We introduce a user interface (UI) developed to help a user set intent specifications to the system, and inspect the difference in agent proposed actions.
arXiv Detail & Related papers (2021-06-03T16:45:40Z) - Scalable Synthesis of Verified Controllers in Deep Reinforcement
Learning [0.0]
We propose an automated verification pipeline capable of synthesizing high-quality safety shields.
Our key insight involves separating safety verification from neural controller, using pre-computed verified safety shields to constrain neural controller training.
Experimental results over a range of realistic high-dimensional deep RL benchmarks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2021-04-20T19:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.