A High-Speed Hardware Algorithm for Modulus Operation and its Application in Prime Number Calculation
- URL: http://arxiv.org/abs/2407.12541v1
- Date: Wed, 17 Jul 2024 13:24:52 GMT
- Title: A High-Speed Hardware Algorithm for Modulus Operation and its Application in Prime Number Calculation
- Authors: W. A. Susantha Wijesinghe,
- Abstract summary: The proposed algorithm use only addition, subtraction, logical, and bit shift operations.
It addresses scalability challenges in cryptographic applications.
The application of this algorithm in prime number calculation up to 500,000 shows its practical utility and performance advantages.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents a novel high-speed hardware algorithm for the modulus operation for FPGA implementation. The proposed algorithm use only addition, subtraction, logical, and bit shift operations, avoiding the complexities and hardware costs associated with multiplication and division. It demonstrates consistent performance across operand sizes ranging from 32-bit to 2048-bit, addressing scalability challenges in cryptographic applications. Implemented in Verilog HDL and tested on a Xilinx Zynq-7000 family FPGA, the algorithm shows a predictable linear scaling of cycle count with bit length difference (BLD), described by the equation $y=2x+2$, where $y$ represents the cycle count and $x$ represents the BLD. The application of this algorithm in prime number calculation up to 500,000 shows its practical utility and performance advantages. Comprehensive evaluations reveal efficient resource utilization, robust timing performance, and effective power management, making it suitable for high-performance and resource-constrained platforms. The results indicate that the proposed algorithm significantly improves the efficiency of modular arithmetic operations, with potential implications for cryptographic protocols and secure computing.
Related papers
- Optimized Inference for 1.58-bit LLMs: A Time and Memory-Efficient Algorithm for Binary and Ternary Matrix Multiplication [8.779871128906787]
Large Language Models (LLMs) suffer from inference inefficiency while relying on advanced computational infrastructure.
We propose algorithms to improve the inference time and memory efficiency of 1.58-bit LLMs with ternary weight matrices.
Our results confirm the superiority of the approach both with respect to time and memory, as we observed a reduction in inference time up to 29x and memory usage up to 6x.
arXiv Detail & Related papers (2024-11-10T04:56:14Z) - Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA [10.630802853096462]
Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations.
This paper proposes a high- throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs.
Using our AMU achieves up to 9x higher throughput and 112x higher energy efficiency over the state-of-the-art solutions for the FPGA-based Quantised Neural Network (QNN) accelerators.
arXiv Detail & Related papers (2024-07-02T15:28:10Z) - Many-body computing on Field Programmable Gate Arrays [5.612626580467746]
We leverage the capabilities of Field Programmable Gate Arrays (FPGAs) for conducting quantum many-body calculations.
This has resulted in a remarkable tenfold speedup compared to CPU-based computation.
arXiv Detail & Related papers (2024-02-09T14:01:02Z) - AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using
Mathematical Programming [2.898055875927704]
We propose a data analysis-driven mathematical programming-based approach to synthesizing approximate operators for FPGAs.
Specifically, we formulate mixed integer quadratically constrained programs based on the results of correlation analysis of the characterization data.
Compared to traditional evolutionary algorithms-based optimization, we report up to 21% improvement in the hypervolume, for joint optimization of PPA and BEHAV.
arXiv Detail & Related papers (2023-09-23T18:23:54Z) - Efficient Additions and Montgomery Reductions of Large Integers for SIMD [2.362288417229025]
This paper presents efficient algorithms for performing Montgomery reductions and additions on integers larger than 512 bits.
New addition algorithm simulates the addition of large integers using a smaller addition, quickly producing the same set of carries.
For Montgomery reductions, serial multiplications are replaced with precomputations that can be effectively calculated using SIMD extensions.
arXiv Detail & Related papers (2023-08-31T03:44:49Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Quantum Goemans-Williamson Algorithm with the Hadamard Test and
Approximate Amplitude Constraints [62.72309460291971]
We introduce a variational quantum algorithm for Goemans-Williamson algorithm that uses only $n+1$ qubits.
Efficient optimization is achieved by encoding the objective matrix as a properly parameterized unitary conditioned on an auxilary qubit.
We demonstrate the effectiveness of our protocol by devising an efficient quantum implementation of the Goemans-Williamson algorithm for various NP-hard problems.
arXiv Detail & Related papers (2022-06-30T03:15:23Z) - Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning.
Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system.
The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Automatic Differentiation in ROOT [62.997667081978825]
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to evaluate the derivative of a function specified by a computer program.
This paper presents AD techniques available in ROOT, supported by Cling, to produce derivatives of arbitrary C/C++ functions.
arXiv Detail & Related papers (2020-04-09T09:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.