Related papers: BoostCom: Towards Efficient Universal Fully Homomorphic Encryption by Boosting the Word-wise Comparisons

BoostCom: Towards Efficient Universal Fully Homomorphic Encryption by Boosting the Word-wise Comparisons

URL: http://arxiv.org/abs/2407.07308v1
Date: Wed, 10 Jul 2024 02:09:10 GMT
Title: BoostCom: Towards Efficient Universal Fully Homomorphic Encryption by Boosting the Word-wise Comparisons
Authors: Ardhi Wiratama Baskara Yudha, Jiaqi Xue, Qian Lou, Huiyang Zhou, Yan Solihin,
Abstract summary: Fully Homomorphic Encryption (FHE) allows for the execution of computations on encrypted data without the need to decrypt it first. In this paper, we introduce BoostCom, a scheme designed to speed up word-wise comparison operations. We achieve an end-to-end performance improvement of more than an order of magnitude (11.1x faster) compared to the state-of-the-art CPU-based uFHE systems.
Score: 14.399750086329345
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fully Homomorphic Encryption (FHE) allows for the execution of computations on encrypted data without the need to decrypt it first, offering significant potential for privacy-preserving computational operations. Emerging arithmetic-based FHE schemes (ar-FHE), like BGV, demonstrate even better performance in word-wise comparison operations over non-arithmetic FHE (na-FHE) schemes, such as TFHE, especially for basic tasks like comparing values, finding maximums, and minimums. This shows the universality of ar-FHE in effectively handling both arithmetic and non-arithmetic operations without the expensive conversion between arithmetic and non-arithmetic FHEs. We refer to universal arithmetic Fully Homomorphic Encryption as uFHE. The arithmetic operations in uFHE remain consistent with those in the original arithmetic FHE, which have seen significant acceleration. However, its non-arithmetic comparison operations differ, are slow, and have not been as thoroughly studied or accelerated. In this paper, we introduce BoostCom, a scheme designed to speed up word-wise comparison operations, enhancing the efficiency of uFHE systems. BoostCom involves a multi-prong optimizations including infrastructure acceleration (Multi-level heterogeneous parallelization and GPU-related improvements), and algorithm-aware optimizations (slot compaction, non-blocking comparison semantic). Together, BoostCom achieves an end-to-end performance improvement of more than an order of magnitude (11.1x faster) compared to the state-of-the-art CPU-based uFHE systems, across various FHE parameters and tasks.

Related papers

CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing and In-Flash Processing [8.114331115730021]
Homomorphic encryption (HE) allows secure computation on encrypted data without revealing the original data. Many cloud computing applications (e.g., DNA read mapping, biometric matching, web search) use exact string matching as a key operation. Prior string matching algorithms that use homomorphic encryption are limited by high computational latency.
arXiv Detail & Related papers (2025-03-12T00:25:58Z)
Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic on GPU [0.5831737970661138]
Homomorphic encryption (FHE) and zero-knowledge proofs (ZKPs) are emerging as solutions for data security in distributed environments. This paper presents a formalization of multi-word modular arithmetic (MoMA), which breaks down large bit-width integer arithmetic into operations on machine words.
arXiv Detail & Related papers (2025-01-13T18:15:44Z)
gECC: A GPU-based high-throughput framework for Elliptic Curve Cryptography [15.39096542261856]
Elliptic Curve Cryptography (ECC) is an encryption method that provides security comparable to traditional techniques like Rivest-Shamir-Adleman (RSA) ECC is still hindered by the significant performance overhead associated with elliptic curve (EC) operations. This paper presents gECC, a versatile framework for ECC optimized for GPU architectures.
arXiv Detail & Related papers (2024-12-22T01:50:50Z)
A High-Speed Hardware Algorithm for Modulus Operation and its Application in Prime Number Calculation [0.0]
The proposed algorithm use only addition, subtraction, logical, and bit shift operations. It addresses scalability challenges in cryptographic applications. The application of this algorithm in prime number calculation up to 500,000 shows its practical utility and performance advantages.
arXiv Detail & Related papers (2024-07-17T13:24:52Z)
A Method for Efficient Heterogeneous Parallel Compilation: A Cryptography Case Study [8.06660833012594]
This paper introduces a novel MLIR-based dialect, named hyper, designed to optimize data management and parallel computation across diverse hardware architectures. We present HETOCompiler, a cryptography-focused compiler prototype that implements multiple hash algorithms and enables their execution on heterogeneous systems.
arXiv Detail & Related papers (2024-07-12T15:12:51Z)
A high-level comparison of state-of-the-art quantum algorithms for breaking asymmetric cryptography [0.0]
We compare Regev's quantum algorithm with Ekeraa-G"artner's extensions on the one hand, and existing state-of-the-art quantum algorithms for factoring and computing discrete logarithms on the other. Our conclusion is that Regev's algorithm without the space-saving optimizations may achieve a per-run advantage, but not an overall advantage, if non-computational quantum memory is cheap.
arXiv Detail & Related papers (2024-05-23T09:59:00Z)
Efficient Additions and Montgomery Reductions of Large Integers for SIMD [2.362288417229025]
This paper presents efficient algorithms for performing Montgomery reductions and additions on integers larger than 512 bits. New addition algorithm simulates the addition of large integers using a smaller addition, quickly producing the same set of carries. For Montgomery reductions, serial multiplications are replaced with precomputations that can be effectively calculated using SIMD extensions.
arXiv Detail & Related papers (2023-08-31T03:44:49Z)
Sliding Window Sum Algorithms for Deep Neural Networks [0.0]
Sliding window sums are widely used for string indexing, hashing and time series analysis. We have developed a family of generic vectorized sliding sum algorithms that provide speedup of O(P/w) for window size $w$ and number of processors P. We show that the sliding sum convolution kernels are more efficient than the commonly used GEMM kernels on the CPU, and could even outperform their GPU counterparts.
arXiv Detail & Related papers (2023-05-25T22:37:40Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
AsySQN: Faster Vertical Federated Learning Algorithms with Better Computation Resource Utilization [159.75564904944707]
We propose an asynchronous quasi-Newton (AsySQN) framework for vertical federated learning (VFL) The proposed algorithms make descent steps scaled by approximate without calculating the inverse Hessian matrix explicitly. We show that the adopted asynchronous computation can make better use of the computation resource.
arXiv Detail & Related papers (2021-09-26T07:56:10Z)
Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications. We propose two new algorithms for bilevel optimization. We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z)
EfficientFCN: Holistically-guided Decoding for Semantic Segmentation [49.27021844132522]
State-of-the-art semantic segmentation algorithms are mostly based on dilated Fully Convolutional Networks (dilatedFCN) We propose the EfficientFCN, whose backbone is a common ImageNet pre-trained network without any dilated convolution. Such a framework achieves comparable or even better performance than state-of-the-art methods with only 1/3 of the computational cost.
arXiv Detail & Related papers (2020-08-24T14:48:23Z)
Efficient Learning of Generative Models via Finite-Difference Score Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference. Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)
Coded Distributed Computing with Partial Recovery [56.08535873173518]
We introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR) CCPR reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation. We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery.
arXiv Detail & Related papers (2020-07-04T21:34:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.