Low latency FPGA implementation of twisted Edward curve cryptography hardware accelerator over prime field
- URL: http://arxiv.org/abs/2504.21342v1
- Date: Wed, 30 Apr 2025 06:03:36 GMT
- Title: Low latency FPGA implementation of twisted Edward curve cryptography hardware accelerator over prime field
- Authors: Md Rownak Hossain, Md Sazedur Rahman, Kh Shahriya Zaman, Walid El Fezzani, Mohammad Arif Sobhan Bhuiyan, Chia Chao Kang, Teh Jia Yew, Mahdi H. Miraz,
- Abstract summary: This article presents a hardware implementation of field-programmable gate array (FPGA) based modular arithmetic, group operation, and point multiplication unit.<n>The proposed point multiplication module consumes 1.4 ms time, operating at a maximal clock frequency of 117.8 MHz.<n>This architecture will be a good candidate for rapid data encryption in high-speed wireless communication networks.
- Score: 0.5420492913071214
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The performance of any elliptic curve cryptography hardware accelerator significantly relies on the efficiency of the underlying point multiplication (PM) architecture. This article presents a hardware implementation of field-programmable gate array (FPGA) based modular arithmetic, group operation, and point multiplication unit on the twisted Edwards curve (Edwards25519) over the 256-bit prime field. An original hardware architecture of a unified point operation module in projective coordinates that executes point addition and point doubling within a single module has been developed, taking only 646 clock cycles and ensuring a better security level than conventional approaches. The proposed point multiplication module consumes 1.4 ms time, operating at a maximal clock frequency of 117.8 MHz utilising 164,730 clock cycles having 183.38 kbps throughput on the Xilinx Virtex-5 FPGA platform for 256-bit length of key. The comparative assessment of latency and throughput across various related recent works indicates the effectiveness of our proposed PM architecture. Finally, this high throughput and low latency PM architecture will be a good candidate for rapid data encryption in high-speed wireless communication networks.
Related papers
- A Constant-Time Hardware Architecture for the CSIDH Key-Exchange Protocol [0.6597195879147555]
This paper presents the first comprehensive hardware study of CSIDH on both FPGA and ASIC platforms.<n>The constant-time CSIDH-512 design requires $1.03times108$ clock cycles per key generation.<n>For ASIC implementation in a 180nm process, the design requires $1.065times108$ clock cycles and achieves a asciitilde 180 MHz frequency, resulting in a key generation latency of 591 ms.
arXiv Detail & Related papers (2025-08-14T21:37:29Z) - GDNTT: an Area-Efficient Parallel NTT Accelerator Using Glitch-Driven Near-Memory Computing and Reconfigurable 10T SRAM [14.319119105134309]
This paper proposes an area-efficient highly parallel NTT accelerator with glitch-driven near-memory computing (GDNTT)<n>The design integrates a 10T for data storage, enabling flexible row/column data access and streamlining circuit mapping strategies.<n> Evaluation results show that the proposed NTT accelerator achieves a 1.528* improvement in throughput-per-area compared to the state-of-the-art.
arXiv Detail & Related papers (2025-05-13T01:53:07Z) - FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs [0.0]
Transformer neural networks (TNNs) are being applied across a widening range of application domains, including natural language processing (NLP), machine translation, and computer vision (CV)
This paper proposes textitFAMOUS, a flexible hardware accelerator for dense multi-head attention computation of TNNs on field-programmable gate arrays (FPGAs)
It is optimized for high utilization of processing elements and on-chip memories to improve parallelism and reduce latency.
arXiv Detail & Related papers (2024-09-21T05:25:46Z) - HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in resource constrained devices [44.99833362998488]
The present work proposes a generic hardware architecture ready to be implemented on FPGA devices.
The inference speed of the design is evaluated over different resource constrained FPGA devices.
We demonstrate that our hardware-aware pruning algorithm achieves a remarkable improvement of a 45 % in inference time compared to a network pruned using the standard algorithm.
arXiv Detail & Related papers (2024-08-26T07:27:12Z) - Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - A Heterogeneous RISC-V based SoC for Secure Nano-UAV Navigation [40.8381466360025]
nano-UAVs face significant power and payload constraints while requiring advanced computing capabilities.
We present Shaheen, a 9mm2 200mW system-on-a-chip (SoC)
It integrates a Linux-capable RV64 core, compliant with the v1.0 ratified Hypervisor extension, along with a low-cost and low-power memory controller.
At the same time, it integrates a fully programmable energy- and area-efficient multi-core cluster of RV32 cores optimized for general-purpose DSP.
arXiv Detail & Related papers (2024-01-07T16:03:47Z) - Spiker+: a framework for the generation of efficient Spiking Neural
Networks FPGA accelerators for inference at the edge [49.42371633618761]
Spiker+ is a framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge.
Spiker+ is tested on two benchmark datasets, the MNIST and the Spiking Heidelberg Digits (SHD)
arXiv Detail & Related papers (2024-01-02T10:42:42Z) - FPGA-QHAR: Throughput-Optimized for Quantized Human Action Recognition
on The Edge [0.6254873489691849]
This paper proposed an integrated end-to-end HAR scalable HW/SW accelerator co-design based on an enhanced 8-bit quantized Two-Stream SimpleNet-PyTorch CNN architecture.
Our development uses partially streaming dataflow architecture to achieve higher throughput versus network design and resource utilization trade-off.
Our proposed methodology achieved nearly 81% prediction accuracy with an approximately 24 FPS real-time inference throughput at 187MHz on ZCU104.
arXiv Detail & Related papers (2023-11-04T10:38:21Z) - A real-time, scalable, fast and highly resource efficient decoder for a quantum computer [1.9014261239550778]
We introduce the Collision Clustering decoder and implement it on FPGA and ASIC hardware.
We simulate logical memory experiments using the leading quantum error correction scheme, the surface code.
We demonstrate MHz decoding speed - matching the requirements of fast-operating modalities such as superconducting qubits.
arXiv Detail & Related papers (2023-09-11T15:46:27Z) - RAMP: A Flat Nanosecond Optical Network and MPI Operations for
Distributed Deep Learning Systems [68.8204255655161]
We introduce a near-exascale, full-bisection bandwidth, all-to-all, single-hop, all-optical network architecture with nanosecond reconfiguration called RAMP.
RAMP supports large-scale distributed and parallel computing systems (12.8Tbps per node for up to 65,536 nodes.
arXiv Detail & Related papers (2022-11-28T11:24:51Z) - LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy
Physics [45.666822327616046]
This work presents a novel reconfigurable architecture for Low Graph Neural Network (LL-GNN) designs for particle detectors.
The LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.
arXiv Detail & Related papers (2022-09-28T12:55:35Z) - Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks.
The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources.
This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z) - A fully pipelined FPGA accelerator for scale invariant feature transform
keypoint descriptor matching, [0.0]
We design a novel fully pipelined hardware accelerator architecture for SIFT keypoint descriptor matching.
The proposed hardware architecture is able to properly handle the memory bandwidth necessary for a fully-pipelined implementation.
Our hardware implementation is 15.7 times faster than the comparable software approach.
arXiv Detail & Related papers (2020-12-17T15:29:41Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.