Characterizing Soft-Error Resiliency in Arm's Ethos-U55 Embedded Machine Learning Accelerator
- URL: http://arxiv.org/abs/2404.09317v1
- Date: Sun, 14 Apr 2024 18:16:16 GMT
- Title: Characterizing Soft-Error Resiliency in Arm's Ethos-U55 Embedded Machine Learning Accelerator
- Authors: Abhishek Tyagi, Reiley Jeyapaul, Chuteng Zhu, Paul Whatmough, Yuhao Zhu,
- Abstract summary: We present a reliability study of Arm's Ethos-U55, an important industrial-scale NPU being utilised in embedded and IoT applications.
We perform large scale RTL-level fault injections to characterize Ethos-U55 against the Automotive Safety Integrity Level D (ASIL-D) resiliency standard.
- Score: 3.8736624857062267
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As Neural Processing Units (NPU) or accelerators are increasingly deployed in a variety of applications including safety critical applications such as autonomous vehicle, and medical imaging, it is critical to understand the fault-tolerance nature of the NPUs. We present a reliability study of Arm's Ethos-U55, an important industrial-scale NPU being utilised in embedded and IoT applications. We perform large scale RTL-level fault injections to characterize Ethos-U55 against the Automotive Safety Integrity Level D (ASIL-D) resiliency standard commonly used for safety-critical applications such as autonomous vehicles. We show that, under soft errors, all four configurations of the NPU fall short of the required level of resiliency for a variety of neural networks running on the NPU. We show that it is possible to meet the ASIL-D level resiliency without resorting to conventional strategies like Dual Core Lock Step (DCLS) that has an area overhead of 100%. We achieve so through selective protection, where hardware structures are selectively protected (e.g., duplicated, hardened) based on their sensitivity to soft errors and their silicon areas. To identify the optimal configuration that minimizes the area overhead while meeting the ASIL-D standard, the main challenge is the large search space associated with the time-consuming RTL simulation. To address this challenge, we present a statistical analysis tool that is validated against Arm silicon and that allows us to quickly navigate hundreds of billions of fault sites without exhaustive RTL fault injections. We show that by carefully duplicating a small fraction of the functional blocks and hardening the Flops in other blocks meets the ASIL-D safety standard while introducing an area overhead of only 38%.
Related papers
- Fine-Tuning Federated Learning-Based Intrusion Detection Systems for Transportation IoT [0.3333209898517398]
Federated Learning (FL) has emerged as a promising method for enabling the decentralized training of IDS models on distributed edge devices.
We propose a hybrid server-edge FL framework that offloads pre-training to a central server while enabling lightweight fine-tuning on edge devices.
This approach reduces memory usage by up to 42%, decreases training times by up to 75%, and achieves competitive IDS accuracy of up to 99.2%.
arXiv Detail & Related papers (2025-02-10T02:12:05Z) - Evaluating Single Event Upsets in Deep Neural Networks for Semantic Segmentation: an embedded system perspective [1.474723404975345]
This paper delves into the robustness assessment in embedded Deep Neural Networks (DNNs)
By scrutinizing the layer-by-layer and bit-by-bit sensitivity of various encoder-decoder models to soft errors, this study thoroughly investigates the vulnerability of segmentation DNNs to SEUs.
We propose a set of practical lightweight error mitigation techniques with no memory or computational cost suitable for resource-constrained deployments.
arXiv Detail & Related papers (2024-12-04T18:28:38Z) - Training on the Fly: On-device Self-supervised Learning aboard Nano-drones within 20 mW [52.280742520586756]
Miniaturized cyber-physical systems (CPSes) powered by tiny machine learning (TinyML), such as nano-drones, are becoming an increasingly attractive technology.
Simple electronics make these CPSes inexpensive, but strongly limit the computational, memory, and sensing resources available on board.
We present a novel on-device fine-tuning approach that relies only on the limited ultra-low power resources available aboard nano-drones.
arXiv Detail & Related papers (2024-08-06T13:11:36Z) - Real-time Threat Detection Strategies for Resource-constrained Devices [1.4815508281465273]
We present an end-to-end process designed to effectively address DNS-tunneling attacks in a router.
We demonstrate that utilizing stateless features for training the ML model, along with features chosen to be independent of the network configuration, leads to highly accurate results.
The deployment of this carefully crafted model, optimized for embedded devices across diverse environments, resulted in high DNS-tunneling attack detection with minimal latency.
arXiv Detail & Related papers (2024-03-22T10:02:54Z) - Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning [52.6706505729803]
We introduce Federated Learning (FL) to collaboratively train a decentralized shared model of Intrusion Detection Systems (IDS)
FLEKD enables a more flexible aggregation method than conventional model fusion techniques.
Experiment results show that the proposed approach outperforms local training and traditional FL in terms of both speed and performance.
arXiv Detail & Related papers (2024-01-22T14:16:37Z) - Scaling #DNN-Verification Tools with Efficient Bound Propagation and
Parallel Computing [57.49021927832259]
Deep Neural Networks (DNNs) are powerful tools that have shown extraordinary results in many scenarios.
However, their intricate designs and lack of transparency raise safety concerns when applied in real-world applications.
Formal Verification (FV) of DNNs has emerged as a valuable solution to provide provable guarantees on the safety aspect.
arXiv Detail & Related papers (2023-12-10T13:51:25Z) - Fed-LSAE: Thwarting Poisoning Attacks against Federated Cyber Threat Detection System via Autoencoder-based Latent Space Inspection [0.0]
In cybersecurity, the sensitive data along with the contextual information and high-quality labeling play an essential role.
In this paper, we investigate a novel robust aggregation method for federated learning, namely Fed-LSAE, which takes advantage of latent space representation.
The experimental results on the CIC-ToN-IoT and N-BaIoT datasets confirm the feasibility of our defensive mechanism against cutting-edge poisoning attacks.
arXiv Detail & Related papers (2023-09-20T04:14:48Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Safe RAN control: A Symbolic Reinforcement Learning Approach [62.997667081978825]
We present a Symbolic Reinforcement Learning (SRL) based architecture for safety control of Radio Access Network (RAN) applications.
We provide a purely automated procedure in which a user can specify high-level logical safety specifications for a given cellular network topology.
We introduce a user interface (UI) developed to help a user set intent specifications to the system, and inspect the difference in agent proposed actions.
arXiv Detail & Related papers (2021-06-03T16:45:40Z) - Lightweight Collaborative Anomaly Detection for the IoT using Blockchain [40.52854197326305]
Internet of things (IoT) devices tend to have many vulnerabilities which can be exploited by an attacker.
Unsupervised techniques, such as anomaly detection, can be used to secure these devices in a plug-and-protect manner.
We present a distributed IoT simulation platform, which consists of 48 Raspberry Pis.
arXiv Detail & Related papers (2020-06-18T14:50:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.