SZKP: A Scalable Accelerator Architecture for Zero-Knowledge Proofs
- URL: http://arxiv.org/abs/2408.05890v1
- Date: Mon, 12 Aug 2024 01:53:58 GMT
- Title: SZKP: A Scalable Accelerator Architecture for Zero-Knowledge Proofs
- Authors: Alhad Daftardar, Brandon Reagen, Siddharth Garg,
- Abstract summary: ZKPs are an emergent paradigm in verifiable computing.
Two key primitives in proof generation are the Number Theoretic Transform (NTT) and Multi-scalar multiplication (MSM)
We present SZKP, a scalable accelerator framework that is the first ASIC to accelerate an entire proof on-chip.
- Score: 10.603449308259496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-Knowledge Proofs (ZKPs) are an emergent paradigm in verifiable computing. In the context of applications like cloud computing, ZKPs can be used by a client (called the verifier) to verify the service provider (called the prover) is in fact performing the correct computation based on a public input. A recently prominent variant of ZKPs is zkSNARKs, generating succinct proofs that can be rapidly verified by the end user. However, proof generation itself is very time consuming per transaction. Two key primitives in proof generation are the Number Theoretic Transform (NTT) and Multi-scalar Multiplication (MSM). These primitives are prime candidates for hardware acceleration, and prior works have looked at GPU implementations and custom RTL. However, both algorithms involve complex dataflow patterns -- standard NTTs have irregular memory accesses for butterfly computations from stage to stage, and MSMs using Pippenger's algorithm have data-dependent memory accesses for partial sum calculations. We present SZKP, a scalable accelerator framework that is the first ASIC to accelerate an entire proof on-chip by leveraging structured dataflows for both NTTs and MSMs. SZKP achieves conservative full-proof speedups of over 400$\times$, 3$\times$, and 12$\times$ over CPU, ASIC, and GPU implementations.
Related papers
- if-ZKP: Intel FPGA-Based Acceleration of Zero Knowledge Proofs [3.0009885036586725]
We present a novel scalable architecture that is suitable for accelerating the zk-SNARK prover compute on FPGAs.
We focus on the multi-scalar multiplication (MSM) that accounts for the majority of time spent in zk-SNARK systems.
Our implementation runs 110x-150x faster compared to reference software library.
arXiv Detail & Related papers (2024-12-17T02:35:32Z) - Attribute Fusion-based Evidential Classifier on Quantum Circuits [22.096543893284995]
Dempster-Shafer Theory (DST) as an effective and robust framework for handling uncertain information is applied in decision-making and pattern classification.
People attempt to address the issue by taking advantage of its consistency with quantum computing to implement DST operations on quantum circuits and realize speedup.
In this paper, we find that Boolean algebra as an essential mathematical tool bridges the definition of DST and quantum computing.
arXiv Detail & Related papers (2024-01-02T15:01:20Z) - A Specialized Semismooth Newton Method for Kernel-Based Optimal
Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples.
We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z) - ReLU and Addition-based Gated RNN [1.484528358552186]
We replace the multiplication and sigmoid function of the conventional recurrent gate with addition and ReLU activation.
This mechanism is designed to maintain long-term memory for sequence processing but at a reduced computational cost.
arXiv Detail & Related papers (2023-08-10T15:18:16Z) - Quick Adaptive Ternary Segmentation: An Efficient Decoding Procedure For
Hidden Markov Models [70.26374282390401]
Decoding the original signal (i.e., hidden chain) from the noisy observations is one of the main goals in nearly all HMM based data analyses.
We present Quick Adaptive Ternary (QATS), a divide-and-conquer procedure which decodes the hidden sequence in polylogarithmic computational complexity.
arXiv Detail & Related papers (2023-05-29T19:37:48Z) - DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures
using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware.
Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z) - Validation tests of GBS quantum computers give evidence for quantum
advantage with a decoherent target [62.997667081978825]
We use positive-P phase-space simulations of grouped count probabilities as a fingerprint for verifying multi-mode data.
We show how one can disprove faked data, and apply this to a classical count algorithm.
arXiv Detail & Related papers (2022-11-07T12:00:45Z) - Integer-arithmetic-only Certified Robustness for Quantized Neural
Networks [14.737638416823772]
A line of work on tackling adversarial examples is certified robustness via randomized smoothing.
Such a mechanism usually uses floating-point arithmetic for calculations in inference.
We show our approach can obtain a comparable accuracy and 4x5x speedup over floating-point arithmetic certified robust methods.
arXiv Detail & Related papers (2021-08-21T01:15:19Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z) - Understanding Cache Boundness of ML Operators on ARM Processors [0.0]
This is the first in-detail analysis of dense and convolution operators, generated with TVM, that compares to the fundamental hardware limits of embedded ARM processors.
One can see that single-precision general matrix multiply (GEMM) and convolutions are bound by L1-cache-read bandwidth.
Explorations of 8-bit and bit-serial quantized operators show that quantization can be used to achieve relevant speedups.
arXiv Detail & Related papers (2021-02-01T16:05:50Z) - Faster Secure Data Mining via Distributed Homomorphic Encryption [108.77460689459247]
Homomorphic Encryption (HE) is receiving more and more attention recently for its capability to do computations over the encrypted field.
We propose a novel general distributed HE-based data mining framework towards one step of solving the scaling problem.
We verify the efficiency and effectiveness of our new framework by testing over various data mining algorithms and benchmark data-sets.
arXiv Detail & Related papers (2020-06-17T18:14:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.