Related papers: A Constant-Time Hardware Architecture for the CSIDH Key-Exchange Protocol

A Constant-Time Hardware Architecture for the CSIDH Key-Exchange Protocol

URL: http://arxiv.org/abs/2508.11082v1
Date: Thu, 14 Aug 2025 21:37:29 GMT
Title: A Constant-Time Hardware Architecture for the CSIDH Key-Exchange Protocol
Authors: Sina Bagheri, Masoud Kaveh, Francisco Hernando-Gallego, Diego Martín, Nuria Serrano,
Abstract summary: This paper presents the first comprehensive hardware study of CSIDH on both FPGA and ASIC platforms.<n>The constant-time CSIDH-512 design requires $1.03times108$ clock cycles per key generation.<n>For ASIC implementation in a 180nm process, the design requires $1.065times108$ clock cycles and achieves a asciitilde 180 MHz frequency, resulting in a key generation latency of 591 ms.
Score: 0.6597195879147555
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The commutative supersingular isogeny Diffie-Hellman (CSIDH) algorithm is a promising post-quantum key exchange protocol, notable for its exceptionally small key sizes, but hindered by computationally intensive key generation. Furthermore, practical implementations must operate in constant time to mitigate side-channel vulnerabilities, which presents an additional performance challenge. This paper presents, to our knowledge, the first comprehensive hardware study of CSIDH, establishing a performance baseline with a unified architecture on both field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) platforms. The architecture features a top-level finite state machine (FSM) that orchestrates a deeply pipelined arithmetic logic unit (ALU) to accelerate the underlying 512-bit finite field operations. The ALU employs a parallelized schoolbook multiplier, completing a 512$\times$512-bit multiplication in 22 clock cycles and enabling a full Montgomery modular multiplication in 87 cycles. The constant-time CSIDH-512 design requires $1.03\times10^{8}$ clock cycles per key generation. When implemented on a Xilinx Zynq UltraScale+ FPGA, the architecture achieves a 200 MHz clock frequency, corresponding to a 515 ms latency. For ASIC implementation in a 180nm process, the design requires $1.065\times10^{8}$ clock cycles and achieves a \textasciitilde 180 MHz frequency, resulting in a key generation latency of 591 ms. By providing the first public hardware performance metrics for CSIDH on both FPGA and ASIC platforms, this work delivers a crucial benchmark for future isogeny-based post-quantum cryptography (PQC) accelerators.

Related papers

A High Performance and Efficient Post-Quantum Crypto-Processor for FrodoKEM [24.961829196441887]
FrodoKEM is a lattice-based post-quantum key encapsulation mechanism (KEM)<n>It has been considered for standardization by the International Organization for Standardization (ISO)<n>This paper presents a high-performance and efficient crypto-processor for FrodoKEM.
arXiv Detail & Related papers (2026-01-23T07:05:42Z)
SCE-NTT: A Hardware Accelerator for Number Theoretic Transform Using Superconductor Electronics [12.616265554244313]
This research explores the use of superconductor electronics (SCE) for accelerating fully homomorphic encryption (FHE)<n>We present SCE-NTT, a dedicated hardware accelerator based on superconductive single flux quantum (SFQ) logic and memory.<n>We show the NTT-128 unit achieves 531 million NTT/sec at 34 GHz, over 100x faster than state-of-the-art CMOS equivalents.
arXiv Detail & Related papers (2025-08-28T23:37:51Z)
Optimization and Synthesis of Quantum Circuits with Global Gates [44.99833362998488]
We use global interactions, such as the Global Molmer-Sorensen gate present in ion trap hardware, to optimize and synthesize quantum circuits.<n>The algorithm is based on the ZX-calculus and uses a specialized circuit extraction routine that groups entangling gates into Global MolmerSorensen gates.<n>We benchmark the algorithm in a variety of circuits, and show how it improves their performance under state-of-the-art hardware considerations.
arXiv Detail & Related papers (2025-07-28T10:25:31Z)
Cost-Effective Optimization and Implementation of the CRT-Paillier Decryption Algorithm for Enhanced Performance [0.0]
We propose an eCRT-Paillier decryption algorithm that shortens the decryption computation chain.<n>These two improvements reduce 50% modular multiplications and 60% judgment operations in the postprocessing of the CRT-Paillier decryption algorithm.<n>A high- throughput and efficient Paillier accelerator named MESA was implemented on the Xilinx Virtex-7 FPGA for evaluation.
arXiv Detail & Related papers (2025-06-22T08:06:36Z)
Toward a Lightweight, Scalable, and Parallel Secure Encryption Engine [0.0]
SPiME is a lightweight, scalable, and FPGA-compatible Secure Processor-in-Memory Encryption architecture.<n>It integrates the Advanced Encryption Standard (AES-128) directly into a Processing-in-Memory framework.<n>It delivers over 25Gbps in sustained encryption throughput with predictable, low-latency performance.
arXiv Detail & Related papers (2025-06-18T02:25:04Z)
Resource Analysis of Low-Overhead Transversal Architectures for Reconfigurable Atom Arrays [38.6948808036416]
We present a low-overhead architecture that supports the layout and resource estimation of large-scale fault-tolerant quantum algorithms.<n>We find that a 2048-bit RSA factoring can be executed with 19 million qubits in 5.6 days, for 1 ms QEC cycle times.
arXiv Detail & Related papers (2025-05-21T18:00:18Z)
Low latency FPGA implementation of twisted Edward curve cryptography hardware accelerator over prime field [0.5420492913071214]
This article presents a hardware implementation of field-programmable gate array (FPGA) based modular arithmetic, group operation, and point multiplication unit.<n>The proposed point multiplication module consumes 1.4 ms time, operating at a maximal clock frequency of 117.8 MHz.<n>This architecture will be a good candidate for rapid data encryption in high-speed wireless communication networks.
arXiv Detail & Related papers (2025-04-30T06:03:36Z)
Design of an FPGA-Based Neutral Atom Rearrangement Accelerator for Quantum Computing [1.003635085077511]
Neutral atoms have emerged as a promising technology for implementing quantum computers. We propose a novel quadrant-based rearrangement algorithm that employs a divide-and-conquer strategy and also enables the simultaneous movement of multiple atoms. This is the first hardware acceleration work for atom rearrangement, and it significantly reduces the processing time.
arXiv Detail & Related papers (2024-11-19T10:38:21Z)
Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs [39.16152482491236]
Bifurcated attention is a method designed to enhance language model inference in shared-context batch decoding scenarios. Our approach addresses the challenge of redundant memory IO costs, a critical factor contributing to latency in high batch sizes and extended context lengths.
arXiv Detail & Related papers (2024-03-13T16:30:57Z)
A Heterogeneous RISC-V based SoC for Secure Nano-UAV Navigation [40.8381466360025]
nano-UAVs face significant power and payload constraints while requiring advanced computing capabilities. We present Shaheen, a 9mm2 200mW system-on-a-chip (SoC) It integrates a Linux-capable RV64 core, compliant with the v1.0 ratified Hypervisor extension, along with a low-cost and low-power memory controller. At the same time, it integrates a fully programmable energy- and area-efficient multi-core cluster of RV32 cores optimized for general-purpose DSP.
arXiv Detail & Related papers (2024-01-07T16:03:47Z)
FPGA-QHAR: Throughput-Optimized for Quantized Human Action Recognition on The Edge [0.6254873489691849]
This paper proposed an integrated end-to-end HAR scalable HW/SW accelerator co-design based on an enhanced 8-bit quantized Two-Stream SimpleNet-PyTorch CNN architecture. Our development uses partially streaming dataflow architecture to achieve higher throughput versus network design and resource utilization trade-off. Our proposed methodology achieved nearly 81% prediction accuracy with an approximately 24 FPS real-time inference throughput at 187MHz on ZCU104.
arXiv Detail & Related papers (2023-11-04T10:38:21Z)
A High Performance Compiler for Very Large Scale Surface Code Computations [38.26470870650882]
We present the first high performance compiler for very large scale quantum error correction. It translates an arbitrary quantum circuit to surface code operations based on lattice surgery. The compiler can process millions of gates using a streaming pipeline at a speed geared towards real-time operation of a physical device.
arXiv Detail & Related papers (2023-02-05T19:06:49Z)
Universal qudit gate synthesis for transmons [44.22241766275732]
We design a superconducting qudit-based quantum processor. We propose a universal gate set featuring a two-qudit cross-resonance entangling gate. We numerically demonstrate the synthesis of $rm SU(16)$ gates for noisy quantum hardware.
arXiv Detail & Related papers (2022-12-08T18:59:53Z)
LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics [45.666822327616046]
This work presents a novel reconfigurable architecture for Low Graph Neural Network (LL-GNN) designs for particle detectors. The LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.
arXiv Detail & Related papers (2022-09-28T12:55:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.