Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic on GPU
- URL: http://arxiv.org/abs/2501.07535v1
- Date: Mon, 13 Jan 2025 18:15:44 GMT
- Title: Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic on GPU
- Authors: Naifeng Zhang, Franz Franchetti,
- Abstract summary: Homomorphic encryption (FHE) and zero-knowledge proofs (ZKPs) are emerging as solutions for data security in distributed environments.
This paper presents a formalization of multi-word modular arithmetic (MoMA), which breaks down large bit-width integer arithmetic into operations on machine words.
- Score: 0.5831737970661138
- License:
- Abstract: Fully homomorphic encryption (FHE) and zero-knowledge proofs (ZKPs) are emerging as solutions for data security in distributed environments. However, the widespread adoption of these encryption techniques is hindered by their significant computational overhead, primarily resulting from core cryptographic operations that involve large integer arithmetic. This paper presents a formalization of multi-word modular arithmetic (MoMA), which breaks down large bit-width integer arithmetic into operations on machine words. We further develop a rewrite system that implements MoMA through recursive rewriting of data types, designed for compatibility with compiler infrastructures and code generators. We evaluate MoMA by generating cryptographic kernels, including basic linear algebra subprogram (BLAS) operations and the number theoretic transform (NTT), targeting various GPUs. Our MoMA-based BLAS operations outperform state-of-the-art multi-precision libraries by orders of magnitude, and MoMA-based NTTs achieve near-ASIC performance on commodity GPUs.
Related papers
- gECC: A GPU-based high-throughput framework for Elliptic Curve Cryptography [15.39096542261856]
Elliptic Curve Cryptography (ECC) is an encryption method that provides security comparable to traditional techniques like Rivest-Shamir-Adleman (RSA)
ECC is still hindered by the significant performance overhead associated with elliptic curve (EC) operations.
This paper presents gECC, a versatile framework for ECC optimized for GPU architectures.
arXiv Detail & Related papers (2024-12-22T01:50:50Z) - Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices [88.33936714942996]
We present a unifying framework that enables searching among all linear operators expressible via an Einstein summation.
We show that differences in the compute-optimal scaling laws are mostly governed by a small number of variables.
We find that Mixture-of-Experts (MoE) learns an MoE in every single linear layer of the model, including the projection in the attention blocks.
arXiv Detail & Related papers (2024-10-03T00:44:50Z) - A Method for Efficient Heterogeneous Parallel Compilation: A Cryptography Case Study [8.06660833012594]
This paper introduces a novel MLIR-based dialect, named hyper, designed to optimize data management and parallel computation across diverse hardware architectures.
We present HETOCompiler, a cryptography-focused compiler prototype that implements multiple hash algorithms and enables their execution on heterogeneous systems.
arXiv Detail & Related papers (2024-07-12T15:12:51Z) - BoostCom: Towards Efficient Universal Fully Homomorphic Encryption by Boosting the Word-wise Comparisons [14.399750086329345]
Fully Homomorphic Encryption (FHE) allows for the execution of computations on encrypted data without the need to decrypt it first.
In this paper, we introduce BoostCom, a scheme designed to speed up word-wise comparison operations.
We achieve an end-to-end performance improvement of more than an order of magnitude (11.1x faster) compared to the state-of-the-art CPU-based uFHE systems.
arXiv Detail & Related papers (2024-07-10T02:09:10Z) - On efficient normal bases over binary fields [0.0]
Binary field extensions are fundamental to cryptography, code-based cryptography, and error-correcting codes.
In this paper, we explore other forms of bases $mathbbF_2n$ over $mathbbF$ to demonstrate efficient implementation of operations within different ranges.
arXiv Detail & Related papers (2024-02-18T11:06:20Z) - Tackling the Matrix Multiplication Micro-kernel Generation with Exo [0.5517652814152908]
We present a step-by-step procedure for generating a dedicated micro-kernel for each new hardware.
Our solution also improves the portability of the generated code, since a hardware target is fully specified by a concise library-based description of its instructions.
arXiv Detail & Related papers (2023-10-26T14:09:57Z) - CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules [51.82044734879657]
We propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions.
We find that CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests.
arXiv Detail & Related papers (2023-10-13T10:17:48Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - High-performance symbolic-numerics via multiple dispatch [52.77024349608834]
Symbolics.jl is an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs.
We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system.
We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers.
arXiv Detail & Related papers (2021-05-09T14:22:43Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.