Related papers: Efficient Hardware Implementation of Modular Multiplier over GF (2m) on FPGA

Efficient Hardware Implementation of Modular Multiplier over GF (2m) on FPGA

URL: http://arxiv.org/abs/2506.09464v3
Date: Tue, 24 Jun 2025 05:32:36 GMT
Title: Efficient Hardware Implementation of Modular Multiplier over GF (2m) on FPGA
Authors: Ruby Kumari, Gaurav Purohit, Abhijit Karmakar,
Abstract summary: Elliptic curve cryptography (ECC) has emerged as the dominant public-key protocol.<n>This work presents a hardware implementation of a Hybrid multiplication technique for modular multiplication over binary field GF(2m)<n>The design optimize the combination of conventional multiplication (CM) and Karatsuba multiplication (KM) to enhance elliptic curve point multiplication (ECPM)<n>Results show the hybrid technique significantly improves speed, hardware efficiency, and resource utilization for ECC cryptographic systems.
Score: 0.10241134756773226
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Elliptic curve cryptography (ECC) has emerged as the dominant public-key protocol, with NIST standardizing parameters for binary field GF(2^m) ECC systems. This work presents a hardware implementation of a Hybrid Multiplication technique for modular multiplication over binary field GF(2m), targeting NIST B-163, 233, 283, and 571 parameters. The design optimizes the combination of conventional multiplication (CM) and Karatsuba multiplication (KM) to enhance elliptic curve point multiplication (ECPM). The key innovation uses CM for smaller operands (up to 41 bits for m=163) and KM for larger ones, reducing computational complexity and enhancing efficiency. The design is evaluated in three areas: Resource Utilization For m=163, the hybrid design uses 6,812 LUTs, a 39.82% reduction compared to conventional methods. For m=233, LUT usage reduces by 45.53% and 70.70% compared to overlap-free and bit-parallel implementations. Delay Performance For m=163, achieves 13.31ns delay, improving by 37.60% over bit-parallel implementations. For m=233, maintains 13.39ns delay. Area-Delay Product For m=163, achieves ADP of 90,860, outperforming bit-parallel (75,337) and digit-serial (43,179) implementations. For m=233, demonstrates 16.86% improvement over overlap-free and 96.10% over bit-parallel designs. Results show the hybrid technique significantly improves speed, hardware efficiency, and resource utilization for ECC cryptographic systems.

Related papers

Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding [144.70522923640095]
Large language models (LLMs) face low hardware efficiency during decoding.<n>This paper introduces Step-3, a hardware-aware model-system co-design optimized for minimizing decoding costs.<n>Step-3 significantly reduces theoretical decoding costs compared with models like DeepSeek-V3 and Qwen3 MoE 235B.
arXiv Detail & Related papers (2025-07-25T16:53:13Z)
UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs [1.9626657740463982]
UnIT (Unstructured Inference-Time pruning) is a lightweight method that dynamically identifies and skips unnecessary multiply-accumulate (MAC) operations during inference.<n>It transforms pruning decisions into lightweight comparisons, replacing multiplications with threshold checks and approximated divisions.<n>UnIT achieves 11.02% to 82.03% MAC reduction, 27.30% to 84.19% faster inference, and 27.33% to 84.38% lower energy consumption compared to training-time pruned models.
arXiv Detail & Related papers (2025-07-10T16:12:06Z)
Cost-Effective Optimization and Implementation of the CRT-Paillier Decryption Algorithm for Enhanced Performance [0.0]
We propose an eCRT-Paillier decryption algorithm that shortens the decryption computation chain.<n>These two improvements reduce 50% modular multiplications and 60% judgment operations in the postprocessing of the CRT-Paillier decryption algorithm.<n>A high- throughput and efficient Paillier accelerator named MESA was implemented on the Xilinx Virtex-7 FPGA for evaluation.
arXiv Detail & Related papers (2025-06-22T08:06:36Z)
The Cambrian Explosion of Mixed-Precision Matrix Multiplication for Quantized Deep Learning Inference [0.9954176833299684]
Deep learning (DL) has led to a shift from traditional 64-bit floating point (FP64) computations toward reduced-precision formats.<n>This paper revisits traditional high-performance gemm and describes strategies for adapting it to mixed-precision integer arithmetic.
arXiv Detail & Related papers (2025-06-13T12:40:16Z)
Learning Adaptive Parallel Reasoning with Language Models [70.1745752819628]
We propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end.<n> APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations.<n>A key innovation is our end-to-end reinforcement learning strategy, optimizing both parent and child inference threads to enhance task success rate without requiring predefined reasoning structures.
arXiv Detail & Related papers (2025-04-21T22:29:02Z)
Speedy MASt3R [68.47052557089631]
MASt3R redefines image matching as a 3D task by leveraging DUSt3R and introducing a fast reciprocal matching scheme.<n>Fast MASt3R achieves a 54% reduction in inference time (198 ms to 91 ms per image pair) without sacrificing accuracy.<n>This advancement enables real-time 3D understanding, benefiting applications like mixed reality navigation and large-scale 3D scene reconstruction.
arXiv Detail & Related papers (2025-03-13T03:56:22Z)
Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale [68.6602625868888]
We introduce convolutional multi-hybrid architectures, with a design grounded on two simple observations.<n>Operators in hybrid models can be tailored to token manipulation tasks such as in-context recall, multi-token recall, and compression.<n>We train end-to-end 1.2 to 2.9 times faster than optimized Transformers, and 1.1 to 1.4 times faster than previous generation hybrids.
arXiv Detail & Related papers (2025-02-25T19:47:20Z)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE that surpasses the existing parallelism schemes.<n>Our results demonstrate at most 52.4% improvement in prefill throughput compared to existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z)
Yet another Improvement of Plantard Arithmetic for Faster Kyber on Low-end 32-bit IoT Devices [14.32828779824487]
We show that the input range of the Plantard multiplication by a constant is at least 2.14 times larger than the original design in TCHES2022. We propose various optimization strategies for NTT/INTT. Our NTT/INTT implementation shows considerable speedups compared to the state-of-the-art work.
arXiv Detail & Related papers (2023-09-01T13:16:13Z)
Efficient Additions and Montgomery Reductions of Large Integers for SIMD [2.362288417229025]
This paper presents efficient algorithms for performing Montgomery reductions and additions on integers larger than 512 bits. New addition algorithm simulates the addition of large integers using a smaller addition, quickly producing the same set of carries. For Montgomery reductions, serial multiplications are replaced with precomputations that can be effectively calculated using SIMD extensions.
arXiv Detail & Related papers (2023-08-31T03:44:49Z)
Reduced Precision Floating-Point Optimization for Deep Neural Network On-Device Learning on MicroControllers [15.37318446043671]
This paper introduces a novel reduced precision optimization technique for On-Device Learning (ODL) primitives on MCU-class devices. Our approach results more than two orders of magnitude faster than existing ODL software frameworks for single-core MCUs.
arXiv Detail & Related papers (2023-05-30T16:14:16Z)
Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks. The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources. This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z)
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale [80.86029795281922]
We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers. A 175B parameter 16/32-bit checkpoint can be loaded, converted to Int8, and used immediately without performance degradation.
arXiv Detail & Related papers (2022-08-15T17:08:50Z)
Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue. We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues. Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z)
PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices [35.90103072918056]
Deep neural network (DNN) has emerged as the most important and popular artificial intelligent (AI) technique. The growth of model size poses a key energy efficiency challenge for the underlying computing platform. This paper proposes PermDNN, a novel approach to generate and execute hardware-friendly structured sparse DNN models.
arXiv Detail & Related papers (2020-04-23T02:26:40Z)
Fast Implementation of Morphological Filtering Using ARM NEON Extension [0.9135092203041721]
We consider speedup potential of morphological image filtering on ARM processors. We propose fast implementation of erosion and dilation using ARM SIMD extension NEON. Experiments showed 3 times efficiency increase for final implementation of erosion and dilation.
arXiv Detail & Related papers (2020-02-19T12:55:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.