FORTALESA: Fault-Tolerant Reconfigurable Systolic Array for DNN Inference
- URL: http://arxiv.org/abs/2503.04426v1
- Date: Thu, 06 Mar 2025 13:35:59 GMT
- Title: FORTALESA: Fault-Tolerant Reconfigurable Systolic Array for DNN Inference
- Authors: Natalia Cherezova, Artur Jutman, Maksim Jenihhin,
- Abstract summary: Deep Neural Networks (DNNs) in mission- and safety-critical applications bring their reliability to the front.<n>This work presents a run-time reconfigurable systolic array architecture with three execution modes and four implementation options.<n>The proposed architecture efficiently protects registers and MAC units of systolic array PEs from transient and permanent faults.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emergence of Deep Neural Networks (DNNs) in mission- and safety-critical applications brings their reliability to the front. High performance demands of DNNs require the use of specialized hardware accelerators. Systolic array architecture is widely used in DNN accelerators due to its parallelism and regular structure. This work presents a run-time reconfigurable systolic array architecture with three execution modes and four implementation options. All four implementations are evaluated in terms of resource utilization, throughput, and fault tolerance improvement. The proposed architecture is used for reliability enhancement of DNN inference on systolic array through heterogeneous mapping of different network layers to different execution modes. The approach is supported by a novel reliability assessment method based on fault propagation analysis. It is used for the exploration of the appropriate execution mode-layer mapping for DNN inference. The proposed architecture efficiently protects registers and MAC units of systolic array PEs from transient and permanent faults. The reconfigurability feature enables a speedup of up to $3\times$, depending on layer vulnerability. Furthermore, it requires $6\times$ less resources compared to static redundancy and $2.5\times$ less resources compared to the previously proposed solution for transient faults.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - HYDRA: Hybrid Data Multiplexing and Run-time Layer Configurable DNN Accelerator [0.0]
The article proposes a layer-multiplexed approach, which further reuses a single activation function within the execution of a single layer with improved Fused-Multiply-Accumulate (FMA)
The proposed architectures achieve reductions over 90% of power consumption and resource utilization improvements, with 35.21 TOPSW.
arXiv Detail & Related papers (2024-09-08T05:10:02Z) - SAFFIRA: a Framework for Assessing the Reliability of
Systolic-Array-Based DNN Accelerators [0.4391603054571586]
This paper introduces a novel hierarchical software-based hardware-aware fault injection strategy tailored for systolic array-based Deep Neural Network (DNN) accelerators.
arXiv Detail & Related papers (2024-03-05T13:17:09Z) - REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints [2.9209462960232235]
State-of-the-art machine learning pipelines generate resource-agnostic models, not capable to adapt at runtime.
We introduce Resource-Efficient Deep Subnetworks (REDS) to tackle model adaptation to variable resources.
We provide a theoretical result and empirical evidence for REDS outstanding performance in terms of submodels' test set accuracy.
arXiv Detail & Related papers (2023-11-22T12:34:51Z) - Detecting train driveshaft damages using accelerometer signals and
Differential Convolutional Neural Networks [67.60224656603823]
This paper proposes the development of a railway axle condition monitoring system based on advanced 2D-Convolutional Neural Network (CNN) architectures.
The resultant system converts the railway axle vibration signals into time-frequency domain representations, i.e., spectrograms, and, thus, trains a two-dimensional CNN to classify them depending on their cracks.
arXiv Detail & Related papers (2022-11-15T15:04:06Z) - enpheeph: A Fault Injection Framework for Spiking and Compressed Deep
Neural Networks [10.757663798809144]
We present enpheeph, a Fault Injection Framework for Spiking and Compressed Deep Neural Networks (DNNs)
By injecting a random and increasing number of faults, we show that DNNs can show a reduction in accuracy with a fault rate as low as 7 x 10 (-7) faults per parameter, with an accuracy drop higher than 40%.
arXiv Detail & Related papers (2022-07-31T00:30:59Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z) - BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted
Regularization Method [69.49386965992464]
We propose a new block-based pruning framework that comprises a general and flexible structured pruning dimension as well as a powerful and efficient reweighted regularization method.
Our framework is universal, which can be applied to both CNNs and RNNs, implying complete support for the two major kinds ofintensive computation layers.
It is the first time that the weight pruning framework achieves universal coverage for both CNNs and RNNs with real-time mobile acceleration and no accuracy compromise.
arXiv Detail & Related papers (2020-01-23T03:30:56Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.