Related papers: Taiyi: A high-performance CKKS accelerator for Practical Fully Homomorphic Encryption

Taiyi: A high-performance CKKS accelerator for Practical Fully Homomorphic Encryption

URL: http://arxiv.org/abs/2403.10188v1
Date: Fri, 15 Mar 2024 10:51:07 GMT
Title: Taiyi: A high-performance CKKS accelerator for Practical Fully Homomorphic Encryption
Authors: Shengyu Fan, Xianglong Deng, Zhuoyu Tian, Zhicheng Hu, Liang Chang, Rui Hou, Dan Meng, Mingzhe Zhang,
Abstract summary: A novel cryptographic theory, Fully Homomorphic Encryption, offers significant security benefits but is hampered by substantial performance overhead. A series of accelerator designs have significantly enhanced the performance of FHE applications, bringing them closer to real-world applicability. These accelerators face challenges related to large on-chip memory and area. A comparative evaluation of Taiyi against previous state-of-the-art designs reveals an average performance improvement of 1.5x and reduces the area overhead by 15.7%.
Score: 9.21642556888646
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fully Homomorphic Encryption (FHE), a novel cryptographic theory enabling computation directly on ciphertext data, offers significant security benefits but is hampered by substantial performance overhead. In recent years, a series of accelerator designs have significantly enhanced the performance of FHE applications, bringing them closer to real-world applicability. However, these accelerators face challenges related to large on-chip memory and area. Additionally, FHE algorithms undergo rapid development, rendering the previous accelerator designs less perfectly adapted to the evolving landscape of optimized FHE applications. In this paper, we conducted a detailed analysis of existing applications with the new FHE method, making two key observations: 1) the bottleneck of FHE applications shifts from NTT to the inner-product operation, and 2) the optimal {\alpha} of KeySwitch changes with the decrease in multiplicative level. Based on these observations, we designed an accelerator named Taiyi, which includes specific hardware for the inner-product operation and optimizes the NTT and BConv operations through algorithmic derivation. A comparative evaluation of Taiyi against previous state-of-the-art designs reveals an average performance improvement of 1.5x and reduces the area overhead by 15.7%.

Related papers

Presto: Hardware Acceleration of Ciphers for Hybrid Homomorphic Encryption [0.8982938200941091]
Hybrid Homomorphic Encryption (HHE) combines symmetric key and homomorphic encryption to reduce cipher expansion crucial in client-server deployments of HE.<n>We develop and evaluate hardware accelerators for the two known CKKS-targeting HHE ciphers, HERA and Rubato.
arXiv Detail & Related papers (2025-07-01T01:48:28Z)
Cost-Effective Optimization and Implementation of the CRT-Paillier Decryption Algorithm for Enhanced Performance [0.0]
We propose an eCRT-Paillier decryption algorithm that shortens the decryption computation chain.<n>These two improvements reduce 50% modular multiplications and 60% judgment operations in the postprocessing of the CRT-Paillier decryption algorithm.<n>A high- throughput and efficient Paillier accelerator named MESA was implemented on the Xilinx Virtex-7 FPGA for evaluation.
arXiv Detail & Related papers (2025-06-22T08:06:36Z)
Stochastic Primal-Dual Double Block-Coordinate for Two-way Partial AUC Maximization [56.805574957824135]
Two-way partial AUCAUC is a critical performance metric for binary classification with imbalanced data.<n>Existing algorithms for TPAUC optimization remain under-explored.<n>We introduce two innovative double-coordinate block-coordinate algorithms for TPAUC optimization.
arXiv Detail & Related papers (2025-05-28T03:55:05Z)
PiT: Progressive Diffusion Transformer [50.46345527963736]
We propose a series of Pseudo textbfProgressive Dtextbfiffusion textbfTransformer (textbfPiT)<n>Our proposed PiT-L achieves 54%$uparrow$ FID improvement over DiT-XL/2 while using less computation.
arXiv Detail & Related papers (2025-05-19T15:02:33Z)
CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing and In-Flash Processing [8.114331115730021]
Homomorphic encryption (HE) allows secure computation on encrypted data without revealing the original data. Many cloud computing applications (e.g., DNA read mapping, biometric matching, web search) use exact string matching as a key operation. Prior string matching algorithms that use homomorphic encryption are limited by high computational latency.
arXiv Detail & Related papers (2025-03-12T00:25:58Z)
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment. We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z)
DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction [57.83978915843095]
This paper introduces DiSK, a novel framework designed to significantly enhance the performance of differentially private gradients. To ensure practicality for large-scale training, we simplify the Kalman filtering process, minimizing its memory and computational demands.
arXiv Detail & Related papers (2024-10-04T19:30:39Z)
BoostCom: Towards Efficient Universal Fully Homomorphic Encryption by Boosting the Word-wise Comparisons [14.399750086329345]
Fully Homomorphic Encryption (FHE) allows for the execution of computations on encrypted data without the need to decrypt it first. In this paper, we introduce BoostCom, a scheme designed to speed up word-wise comparison operations. We achieve an end-to-end performance improvement of more than an order of magnitude (11.1x faster) compared to the state-of-the-art CPU-based uFHE systems.
arXiv Detail & Related papers (2024-07-10T02:09:10Z)
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation. DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z)
FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption [9.884698447131374]
Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption. FHE is significantly slower than computation on plain data due to the increase in data size after encryption. We propose a PIM-based FHE accelerator, FHEmem, which exploits a novel processing in-memory architecture.
arXiv Detail & Related papers (2023-11-27T20:11:38Z)
REED: Chiplet-Based Accelerator for Fully Homomorphic Encryption [4.713756093611972]
We present the first-of-its-kind multi-chiplet-based FHE accelerator REED' for overcoming the limitations of prior monolithic designs. Results demonstrate that REED 2.5D microprocessor consumes 96.7 mm$2$ chip area, 49.4 W average power in 7nm technology.
arXiv Detail & Related papers (2023-08-05T14:04:39Z)
HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions. We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization [60.72410937614299]
We propose a new tractable bi-level optimization problem, design and analyze a new set of algorithms termed Bi-level AT (FAST-BAT) FAST-BAT is capable of defending sign-based projected descent (PGD) attacks without calling any gradient sign method and explicit robust regularization.
arXiv Detail & Related papers (2021-12-23T06:25:36Z)
Easy and Efficient Transformer : Scalable Inference Solution For large NLP mode [14.321889138798072]
This paper introduces a series of ultra-large-scale pre-training model optimization methods. An inference engine -- Easy and Efficient Transformer (EET) is proposed. EET achieves a 1.5-15x state-of-art speedup varying with context length.
arXiv Detail & Related papers (2021-04-26T11:00:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.