Related papers: GDNTT: an Area-Efficient Parallel NTT Accelerator Using Glitch-Driven Near-Memory Computing and Reconfigurable 10T SRAM

GDNTT: an Area-Efficient Parallel NTT Accelerator Using Glitch-Driven Near-Memory Computing and Reconfigurable 10T SRAM

URL: http://arxiv.org/abs/2505.08162v1
Date: Tue, 13 May 2025 01:53:07 GMT
Title: GDNTT: an Area-Efficient Parallel NTT Accelerator Using Glitch-Driven Near-Memory Computing and Reconfigurable 10T SRAM
Authors: Hengyu Ding, Houran Ji, Jia Li, Jinhang Chen, Chin-Wing Sham, Yao Wang,
Abstract summary: This paper proposes an area-efficient highly parallel NTT accelerator with glitch-driven near-memory computing (GDNTT)<n>The design integrates a 10T for data storage, enabling flexible row/column data access and streamlining circuit mapping strategies.<n> Evaluation results show that the proposed NTT accelerator achieves a 1.528* improvement in throughput-per-area compared to the state-of-the-art.
Score: 14.319119105134309
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid advancement of quantum computing technology, post-quantum cryptography (PQC) has emerged as a pivotal direction for next-generation encryption standards. Among these, lattice-based cryptographic schemes rely heavily on the fast Number Theoretic Transform (NTT) over polynomial rings, whose performance directly determines encryption/decryption throughput and energy efficiency. However, existing software-based NTT implementations struggle to meet the real-time performance and low-power requirements of IoT and edge devices. To address this challenge, this paper proposes an area-efficient highly parallel NTT accelerator with glitch-driven near-memory computing (GDNTT). The design integrates a 10T SRAM for data storage, enabling flexible row/column data access and streamlining circuit mapping strategies. Furthermore, a glitch generator is incorporated into the near-memory computing unit, significantly reducing the latency of butterfly operations. Evaluation results show that the proposed NTT accelerator achieves a 1.5~28* improvement in throughput-per-area compared to the state-of-the-art.

Related papers

Dynamic Tsetlin Machine Accelerators for On-Chip Training at the Edge using FPGAs [0.3440236962613469]
This paper presents a Dynamic Tsetlin Machine (DTM) training accelerator as an alternative to Deep Neural Networks (DNNs)<n>DTM trains with fewer multiply-accumulates, devoid of derivative computation.<n>The proposed accelerator offers 2.54x more Giga operations per second per Watt (GOP/s per W) and uses 6x less power than the next-best comparable design.
arXiv Detail & Related papers (2025-04-28T13:38:53Z)
A Unified Hardware Accelerator for Fast Fourier Transform and Number Theoretic Transform [0.0]
Number Theoretic Transform (NTT) is indispensable tool for computing efficient multiplications in post-quantum lattice-based cryptography.<n>We demonstrate a unified hardware accelerator supporting both 512-point complex FFT and 256-point NTT.<n>Our implementation achieves performance comparable to state-of-the-art ML-KEM / ML-DSA NTT accelerators on FPGA.
arXiv Detail & Related papers (2025-04-15T12:13:05Z)
COMPASS: A Compiler Framework for Resource-Constrained Crossbar-Array Based In-Memory Deep Learning Accelerators [6.172271429579593]
We propose a compiler framework for resource-constrained crossbar-based processing-in-memory (PIM) deep neural network (DNN) accelerators.<n>We propose an algorithm to determine the optimal partitioning that divides the layers so that each partition can be accelerated on chip.
arXiv Detail & Related papers (2025-01-12T11:31:25Z)
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z)
Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA [20.629635991749808]
This paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs. At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads. At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient BayesNNs.
arXiv Detail & Related papers (2024-06-20T17:08:42Z)
Spiker+: a framework for the generation of efficient Spiking Neural Networks FPGA accelerators for inference at the edge [49.42371633618761]
Spiker+ is a framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge. Spiker+ is tested on two benchmark datasets, the MNIST and the Spiking Heidelberg Digits (SHD)
arXiv Detail & Related papers (2024-01-02T10:42:42Z)
LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics [45.666822327616046]
This work presents a novel reconfigurable architecture for Low Graph Neural Network (LL-GNN) designs for particle detectors. The LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.
arXiv Detail & Related papers (2022-09-28T12:55:35Z)
Decomposition of Matrix Product States into Shallow Quantum Circuits [62.5210028594015]
tensor network (TN) algorithms can be mapped to parametrized quantum circuits (PQCs) We propose a new protocol for approximating TN states using realistic quantum circuits. Our results reveal one particular protocol, involving sequential growth and optimization of the quantum circuit, to outperform all other methods.
arXiv Detail & Related papers (2022-09-01T17:08:41Z)
Deep Learning-Based Synchronization for Uplink NB-IoT [72.86843435313048]
We propose a neural network (NN)-based algorithm for device detection and time of arrival (ToA) estimation for the narrowband physical random-access channel (NPRACH) of narrowband internet of things (NB-IoT) The introduced NN architecture leverages residual convolutional networks as well as knowledge of the preamble structure of the 5G New Radio (5G NR) specifications.
arXiv Detail & Related papers (2022-05-22T12:16:43Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)
BRDS: An FPGA-based LSTM Accelerator with Row-Balanced Dual-Ratio Sparsification [3.3711251611130337]
A hardware-friendly pruning algorithm for reducing energy consumption and improving the speed of Long Short-Term Memory (LSTM) neural network accelerators is presented. Results show that the proposed accelerator could provide up to 272% higher effective GOPS/W and the perplexity error is reduced by up to 1.4% for the PTB dataset.
arXiv Detail & Related papers (2021-01-07T18:23:48Z)
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.