A Unified Hardware Accelerator for Fast Fourier Transform and Number Theoretic Transform
- URL: http://arxiv.org/abs/2504.11124v1
- Date: Tue, 15 Apr 2025 12:13:05 GMT
- Title: A Unified Hardware Accelerator for Fast Fourier Transform and Number Theoretic Transform
- Authors: Rishabh Shrivastava, Chaitanya Prasad Ratnala, Durga Manasa Puli, Utsav Banerjee,
- Abstract summary: Number Theoretic Transform (NTT) is indispensable tool for computing efficient multiplications in post-quantum lattice-based cryptography.<n>We demonstrate a unified hardware accelerator supporting both 512-point complex FFT and 256-point NTT.<n>Our implementation achieves performance comparable to state-of-the-art ML-KEM / ML-DSA NTT accelerators on FPGA.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Number Theoretic Transform (NTT) is an indispensable tool for computing efficient polynomial multiplications in post-quantum lattice-based cryptography. It has strong resemblance with the Fast Fourier Transform (FFT), which is the most widely used algorithm in digital signal processing. In this work, we demonstrate a unified hardware accelerator supporting both 512-point complex FFT as well as 256-point NTT for the recently standardized NIST post-quantum key encapsulation and digital signature algorithms ML-KEM and ML-DSA respectively. Our proposed architecture effectively utilizes the arithmetic circuitry required for complex FFT, and the only additional circuits required are for modular reduction along with modifications in the control logic. Our implementation achieves performance comparable to state-of-the-art ML-KEM / ML-DSA NTT accelerators on FPGA, thus demonstrating how an FFT accelerator can be augmented to support NTT and the unified hardware can be used for both digital signal processing and post-quantum lattice-based cryptography applications.
Related papers
- Generalized tensor transforms and their applications in classical and quantum computing [0.0]
We introduce a novel framework for Generalized Transforms (GTTs), constructed through an $n$-fold tensor product of an arbitrary $b times b$ unitary matrix $W$.<n>For quantum applications, our GTT-based algorithm achieves both gate complexity and circuit depth of $O(log_b N)$, where $N = bn$ denotes the length of the vector input.<n>We explore diverse applications of GTTs in quantum computing, including quantum state compression and transmission, function encoding and quantum digital signal processing.
arXiv Detail & Related papers (2025-07-03T08:28:00Z) - GDNTT: an Area-Efficient Parallel NTT Accelerator Using Glitch-Driven Near-Memory Computing and Reconfigurable 10T SRAM [14.319119105134309]
This paper proposes an area-efficient highly parallel NTT accelerator with glitch-driven near-memory computing (GDNTT)<n>The design integrates a 10T for data storage, enabling flexible row/column data access and streamlining circuit mapping strategies.<n> Evaluation results show that the proposed NTT accelerator achieves a 1.528* improvement in throughput-per-area compared to the state-of-the-art.
arXiv Detail & Related papers (2025-05-13T01:53:07Z) - A Quantum Range-Doppler Algorithm for Synthetic Aperture Radar Image Formation [48.123217909844946]
We show how in general reference functions, a key element in many SAR focusing algorithms, can be mapped to quantum gates.
We find that the core of the quantum range-Doppler algorithm has a computational complexity $O(N)$, less than its classical counterpart.
arXiv Detail & Related papers (2025-04-29T14:24:23Z) - Construction of Boolean Logic Gates Using QFT-Based Adder Architecture [0.0]
We construct the quantum reversible counterparts of the logical AND, OR, XOR, NOR, and NAND gates.
We utilize a quantum Fourier transform (QFT)-based adder circuit that replicates the functionality of a digital half-adder.
arXiv Detail & Related papers (2025-04-23T20:47:36Z) - Variable-size Symmetry-based Graph Fourier Transforms for image compression [65.7352685872625]
We propose a new family of Symmetry-based Graph Fourier Transforms of variable sizes into a coding framework.
Our proposed algorithm generates symmetric graphs on the grid by adding specific symmetrical connections between nodes.
Experiments show that SBGFTs outperform the primary transforms integrated in the explicit Multiple Transform Selection.
arXiv Detail & Related papers (2024-11-24T13:00:44Z) - Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - HF-NTT: Hazard-Free Dataflow Accelerator for Number Theoretic Transform [2.4578723416255754]
Polynomial multiplication is one of the fundamental operations in many applications, such as fully homomorphic encryption (FHE)
The Numberoretic Transform (NTT) has proven an effective tool in enhancing adaptable multiplication, but a fast method for generating NTT accelerators is lacking.
In this paper, we introduce HF-NTT, a novel NTT accelerator, and introduce a data movement strategy that eliminates the need for bit-reversal operations.
arXiv Detail & Related papers (2024-10-07T07:31:38Z) - AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community.
We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z) - Low-latency machine learning FPGA accelerator for multi-qubit-state discrimination [1.6773398825542363]
Measuring a qubit state is a fundamental yet error-prone operation in quantum computing.
Here, we utilize an integrated approach to deploy neural networks onto field-programmable gate arrays (FPGA)
We demonstrate that implementing a fully connected neural network accelerator for multi-qubit readout is advantageous.
arXiv Detail & Related papers (2024-07-04T11:34:43Z) - Decomposition of Matrix Product States into Shallow Quantum Circuits [62.5210028594015]
tensor network (TN) algorithms can be mapped to parametrized quantum circuits (PQCs)
We propose a new protocol for approximating TN states using realistic quantum circuits.
Our results reveal one particular protocol, involving sequential growth and optimization of the quantum circuit, to outperform all other methods.
arXiv Detail & Related papers (2022-09-01T17:08:41Z) - RF-Photonic Deep Learning Processor with Shannon-Limited Data Movement [0.0]
Optical neural networks (ONNs) are promising accelerators with ultra-low latency and energy consumption.
We introduce our multiplicative analog frequency transform ONN (MAFT-ONN) that encodes the data in the frequency domain.
We experimentally demonstrate the first hardware accelerator that computes fully-analog deep learning on raw RF signals.
arXiv Detail & Related papers (2022-07-08T16:37:13Z) - Batch Processing and Data Streaming Fourier-based Convolutional Neural
Network Accelerator [4.7257913147626995]
Decision-making by artificial neural networks with minimal latency is paramount for numerous applications such as navigation, tracking, and real-time machine action systems.
This requires the machine learning hardware to handle multidimensional data with a high throughput.
We demonstrate a non-van Neuman-based machine learning acceleration with a Fourier Convolutional Neural Network (FCNN) accelerator.
arXiv Detail & Related papers (2021-12-23T01:06:17Z) - Variational Quantum Optimization with Multi-Basis Encodings [62.72309460291971]
We introduce a new variational quantum algorithm that benefits from two innovations: multi-basis graph complexity and nonlinear activation functions.
Our results in increased optimization performance, two increase in effective landscapes and a reduction in measurement progress.
arXiv Detail & Related papers (2021-06-24T20:16:02Z) - Interleaving: Modular architectures for fault-tolerant photonic quantum
computing [50.591267188664666]
Photonic fusion-based quantum computing (FBQC) uses low-loss photonic delays.
We present a modular architecture for FBQC in which these components are combined to form "interleaving modules"
Exploiting the multiplicative power of delays, each module can add thousands of physical qubits to the computational Hilbert space.
arXiv Detail & Related papers (2021-03-15T18:00:06Z) - Low-complexity Image and Video Coding Based on an Approximate Discrete Tchebichef Transform [0.0]
We introduce a new low-complexity approximation for the discrete Tchebichef transform (DTT)
The fast algorithm of the proposed transform is multiplication-free and requires a reduced number of additions and bit-shifting operations.
Image and video compression simulations in popular standards shows good performance of the proposed transform.
arXiv Detail & Related papers (2016-09-24T14:49:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.