Related papers: if-ZKP: Intel FPGA-Based Acceleration of Zero Knowledge Proofs

if-ZKP: Intel FPGA-Based Acceleration of Zero Knowledge Proofs

URL: http://arxiv.org/abs/2412.12481v1
Date: Tue, 17 Dec 2024 02:35:32 GMT
Title: if-ZKP: Intel FPGA-Based Acceleration of Zero Knowledge Proofs
Authors: Shahzad Ahmad Butt, Benjamin Reynolds, Veeraraghavan Ramamurthy, Xiao Xiao, Pohrong Chu, Setareh Sharifian, Sergey Gribok, Bogdan Pasca,
Abstract summary: We present a novel scalable architecture that is suitable for accelerating the zk-SNARK prover compute on FPGAs.<n>We focus on the multi-scalar multiplication (MSM) that accounts for the majority of time spent in zk-SNARK systems.<n>Our implementation runs 110x-150x faster compared to reference software library.
Score: 3.0009885036586725
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Zero-Knowledge Proofs (ZKPs) have emerged as an important cryptographic technique allowing one party (prover) to prove the correctness of a statement to some other party (verifier) and nothing else. ZKPs give rise to user's privacy in many applications such as blockchains, digital voting, and machine learning. Traditionally, ZKPs suffered from poor scalability but recently, a sub-class of ZKPs known as Zero-knowledge Succinct Non-interactive ARgument of Knowledges (zk-SNARKs) have addressed this challenge. They are getting significant attention and are being implemented by many public libraries. In this paper, we present a novel scalable architecture that is suitable for accelerating the zk-SNARK prover compute on FPGAs. We focus on the multi-scalar multiplication (MSM) that accounts for the majority of computation time spent in zk-SNARK systems. The MSM calculations extensive rely on modular arithmetic so highly optimized Intel IP Libraries for modular arithmetic are used. The proposed architecture exploits the parallelism inherent to MSM and is implemented using the Intel OneAPI framework for FPGAs. Our implementation runs 110x-150x faster compared to reference software library, uses a generic curve form in Jacobian coordinates and is the first to report FPGA hardware acceleration results for BLS12-381 and BN128 family of elliptic curves.

Related papers

Extractors: QLDPC Architectures for Efficient Pauli-Based Computation [42.95092131256421]
We propose a new primitive that can augment any QLDPC memory into a computational block well-suited for Pauli-based computation. In particular, any logical Pauli operator supported on the memory can be fault-tolerantly measured in one logical cycle. Our architecture can implement universal quantum circuits via parallel logical measurements.
arXiv Detail & Related papers (2025-03-13T14:07:40Z)
KGCompiler: Deep Learning Compilation Optimization for Knowledge Graph Complex Logical Query Answering [30.39578108108025]
We introduce KGCompiler, the first compiler specifically designed for CLQA tasks. We show that KGCompiler accelerates CLQA algorithms by factors ranging from 1.04x to 8.26x, with an average speedup of 3.71x.
arXiv Detail & Related papers (2025-03-04T01:24:32Z)
HW/SW Implementation of MiRitH on Embedded Platforms [2.3099144596725574]
We present to the best of our knowledge the first design space exploration of MiRitH, a promising MPCitH algorithm, for embedded devices. We develop a library of mixed HW/SW blocks on the Xilinx ZYNQ 7000, and, based on this library, we explore optimal solutions under runtime or FPGA resource constraints. Our results show that MiRitH is a viable algorithm for embedded devices in terms of runtime and FPGA resource requirements.
arXiv Detail & Related papers (2024-11-19T08:30:08Z)
MagicPIG: LSH Sampling for Efficient LLM Generation [41.75038064509643]
We show that TopK attention itself suffers from quality degradation in certain downstream tasks because attention is not always as sparse as expected.<n>We propose MagicPIG, a heterogeneous system based on Locality Sensitive Hashing (LSH)<n>MagicPIG significantly reduces the workload of attention while preserving high accuracy for diverse tasks.
arXiv Detail & Related papers (2024-10-21T16:44:51Z)
Fast Algorithms and Implementations for Computing the Minimum Distance of Quantum Codes [43.96687298077534]
The distance of a stabilizer quantum code determines the number of errors that can be detected and corrected. We present three new fast algorithms and implementations for computing the symplectic distance of the associated classical code.
arXiv Detail & Related papers (2024-08-20T11:24:30Z)
SZKP: A Scalable Accelerator Architecture for Zero-Knowledge Proofs [10.603449308259496]
ZKPs are an emergent paradigm in verifiable computing. Two key primitives in proof generation are the Number Theoretic Transform (NTT) and Multi-scalar multiplication (MSM) We present SZKP, a scalable accelerator framework that is the first ASIC to accelerate an entire proof on-chip.
arXiv Detail & Related papers (2024-08-12T01:53:58Z)
Benchmarking Predictive Coding Networks -- Made Simple [48.652114040426625]
We first propose a library called PCX, whose focus lies on performance and simplicity. We use PCX to implement a large set of benchmarks for the community to use for their experiments.
arXiv Detail & Related papers (2024-07-01T10:33:44Z)
Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA [20.629635991749808]
This paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs. At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads. At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient BayesNNs.
arXiv Detail & Related papers (2024-06-20T17:08:42Z)
Highly Versatile FPGA-Implemented Cyber Coherent Ising Machine [0.7950056272504447]
We have developed an FPGA implemented cyber coherent Ising machine (cyber CIM) that is much more versatile than previous implementations using FPGAs. Our architecture is versatile since it can be applied to the open-loop CIM, which was proposed when CIM research began, to the closed-loop CIM. The cyber CIM enables applications such as CDMA multi-user detector and L0 compressed sensing which were not possible with earlier FPGA systems.
arXiv Detail & Related papers (2024-06-08T07:09:27Z)
Efficient Fault Detection Architectures for Modular Exponentiation Targeting Cryptographic Applications Benchmarked on FPGAs [2.156170153103442]
We propose a lightweight fault detection architecture tailored for modular exponentiation. Our approach achieves an error detection rate close to 100%, all while introducing a modest computational overhead of approximately 7%.
arXiv Detail & Related papers (2024-02-28T04:02:41Z)
Many-body computing on Field Programmable Gate Arrays [5.3808713424582395]
We leverage the capabilities of Field Programmable Gate Arrays (FPGAs) for conducting quantum many-body calculations. This has resulted in a tenfold speedup compared to CPU-based computation for a Monte Carlo algorithm. For the first time, the utilization of FPGA to accelerate a typical tensor network algorithm for many-body ground state calculations.
arXiv Detail & Related papers (2024-02-09T14:01:02Z)
Extreme Compression of Large Language Models via Additive Quantization [59.3122859349777]
Our algorithm, called AQLM, generalizes the classic Additive Quantization (AQ) approach for information retrieval. We provide fast GPU and CPU implementations of AQLM for token generation, which enable us to match or outperform optimized FP16 implementations for speed.
arXiv Detail & Related papers (2024-01-11T18:54:44Z)
An FPGA-based Solution for Convolution Operation Acceleration [0.0]
This paper proposes an FPGA-based architecture to accelerate the convolution operation. The project's purpose is to produce an FPGA IP core that can process a convolutional layer at a time.
arXiv Detail & Related papers (2022-06-09T14:12:30Z)
Providing Meaningful Data Summarizations Using Examplar-based Clustering in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms. We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z)
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents. Our models perform equivalently to backprop on challenging machine learning benchmarks. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.