Lightweight Fault Detection Architecture for NTT on FPGA
- URL: http://arxiv.org/abs/2508.03062v1
- Date: Tue, 05 Aug 2025 04:23:50 GMT
- Title: Lightweight Fault Detection Architecture for NTT on FPGA
- Authors: Rourab Paul, Paresh Baidya, Krishnendu Guha,
- Abstract summary: Post-Quantum Cryptographic (PQC) algorithms are mathematically secure and resistant to quantum attacks.<n>They can still leak sensitive information in hardware implementations due to natural faults or intentional fault injections.<n>This research proposes a lightweight, efficient, recomputation-based fault detection module.
- Score: 0.8793721044482612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Post-Quantum Cryptographic (PQC) algorithms are mathematically secure and resistant to quantum attacks but can still leak sensitive information in hardware implementations due to natural faults or intentional fault injections. The intent fault injection in side-channel attacks reduces the reliability of crypto implementation in future generation network security procesors. In this regard, this research proposes a lightweight, efficient, recomputation-based fault detection module implemented on a Field Programmable Gate Array (FPGA) for Number Theoretic Transform (NTT). The NTT is primarily composed of memory units and the Cooley-Tukey Butterfly Unit (CT-BU), a critical and computationally intensive hardware component essential for polynomial multiplication. NTT and polynomial multiplication are fundamental building blocks in many PQC algorithms, including Kyber, NTRU, Ring-LWE, and others. In this paper, we present a fault detection method called : Recomputation with a Modular Offset (REMO) for the logic blocks of the CT-BU using Montgomery Reduction and another method called Memory Rule Checkers for the memory components used within the NTT. The proposed fault detection framework sets a new benchmark by achieving high efficiency with significant low implementation cost. It occupies only 16 slices and a single DSP block, with a power consumption of just 3mW in Artix-7 FPGA. The REMO-based detection mechanism achieves a fault coverage of 87.2% to 100%, adaptable across various word sizes, fault bit counts, and fault injection modes. Similarly, the Memory Rule Checkers demonstrate robust performance, achieving 50.7% to 100% fault detection depending on and the nature of injected faults.
Related papers
- Algorithmic Strategies for Sustainable Reuse of Neural Network Accelerators with Permanent Faults [9.89051364546275]
We propose novel approaches that quantify permanent hardware faults in neural network (NN) accelerators by uniquely integrating the behavior of the faulty component instead of bypassing it.<n>We propose several algorithmic mitigation techniques for a subset of stuck-at faults, such as Invertible Scaling or Shifting of activations and weights, or fine tuning with the faulty behavior.<n> Notably, the proposed techniques do not require any hardware modification, instead relying on existing components of widely used systolic array based accelerators.
arXiv Detail & Related papers (2024-12-17T18:56:09Z) - Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask [74.64216073678617]
AMD performs parallel NAR inference within contiguous blocks of output labels concealed using attention masks.
A beam search algorithm is designed to leverage a dynamic fusion of CTC, AR Decoder, and AMD probabilities.
Experiments on the LibriSpeech-100hr corpus suggest the tripartite Decoder incorporating the AMD module produces a maximum decoding speed-up ratio of 1.73x.
arXiv Detail & Related papers (2024-06-14T13:42:38Z) - Efficient and Scalable Architectures for Multi-Level Superconducting Qubit Readout [0.8999666725996978]
Many processor modalities are inherently multi-level systems. This leads to occasional leakage into energy levels outside the computational subspace.<n>We propose a scalable, high-fidelity three-level readout that reduces FPGA resource usage by $60times$ compared to the baseline.<n>Our design supports efficient, real-time implementation on off-the-shelf FPGAs, delivering a 6.6% improvement in readout accuracy over the baseline.
arXiv Detail & Related papers (2024-05-14T22:32:51Z) - Efficient Algorithm Level Error Detection for Number-Theoretic Transform Assessed on FPGAs [2.156170153103442]
This paper introduces algorithm level fault detection schemes in NTT multiplication.
We evaluate this through the simulation of a fault model, ensuring that the conducted assessments accurately mirror the obtained results.
We achieve a comparable throughput with just a 9% increase in area and 13% increase in latency compared to the original hardware implementations.
arXiv Detail & Related papers (2024-03-02T14:05:56Z) - Efficient Fault Detection Architectures for Modular Exponentiation Targeting Cryptographic Applications Benchmarked on FPGAs [2.156170153103442]
We propose a lightweight fault detection architecture tailored for modular exponentiation.
Our approach achieves an error detection rate close to 100%, all while introducing a modest computational overhead of approximately 7%.
arXiv Detail & Related papers (2024-02-28T04:02:41Z) - Check-Agnosia based Post-Processor for Message-Passing Decoding of Quantum LDPC Codes [3.4602940992970908]
We introduce a new post-processing algorithm with a hardware-friendly orientation, providing error correction performance competitive to the state-of-the-art techniques.
We show that latency values close to one microsecond can be obtained on the FPGA board, and provide evidence that much lower latency values can be obtained for ASIC implementations.
arXiv Detail & Related papers (2023-10-23T14:51:22Z) - Global Context Aggregation Network for Lightweight Saliency Detection of
Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module.
The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z) - Deep Quantum Error Correction [73.54643419792453]
Quantum error correction codes (QECC) are a key component for realizing the potential of quantum computing.
In this work, we efficiently train novel emphend-to-end deep quantum error decoders.
The proposed method demonstrates the power of neural decoders for QECC by achieving state-of-the-art accuracy.
arXiv Detail & Related papers (2023-01-27T08:16:26Z) - Logical blocks for fault-tolerant topological quantum computation [55.41644538483948]
We present a framework for universal fault-tolerant logic motivated by the need for platform-independent logical gate definitions.
We explore novel schemes for universal logic that improve resource overheads.
Motivated by the favorable logical error rates for boundaryless computation, we introduce a novel computational scheme.
arXiv Detail & Related papers (2021-12-22T19:00:03Z) - Fault-tolerant parity readout on a shuttling-based trapped-ion quantum
computer [64.47265213752996]
We experimentally demonstrate a fault-tolerant weight-4 parity check measurement scheme.
We achieve a flag-conditioned parity measurement single-shot fidelity of 93.2(2)%.
The scheme is an essential building block in a broad class of stabilizer quantum error correction protocols.
arXiv Detail & Related papers (2021-07-13T20:08:04Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.