Presto: Hardware Acceleration of Ciphers for Hybrid Homomorphic Encryption
- URL: http://arxiv.org/abs/2507.00367v1
- Date: Tue, 01 Jul 2025 01:48:28 GMT
- Title: Presto: Hardware Acceleration of Ciphers for Hybrid Homomorphic Encryption
- Authors: Yeonsoo Jeon, Mattan Erez, Michael Orshansky,
- Abstract summary: Hybrid Homomorphic Encryption (HHE) combines symmetric key and homomorphic encryption to reduce cipher expansion crucial in client-server deployments of HE.<n>We develop and evaluate hardware accelerators for the two known CKKS-targeting HHE ciphers, HERA and Rubato.
- Score: 0.8982938200941091
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hybrid Homomorphic Encryption (HHE) combines symmetric key and homomorphic encryption to reduce ciphertext expansion crucial in client-server deployments of HE. Special symmetric ciphers, amenable to efficient HE evaluation, have been developed. Their client-side deployment calls for performant and energy-efficient implementation, and in this paper we develop and evaluate hardware accelerators for the two known CKKS-targeting HHE ciphers, HERA and Rubato. We design vectorized and overlapped functional modules. The design exploits transposition-invariance property of the MixColumns and MixRows function and alternates the order of intermediate state to eliminate bubbles in stream key generation, improving latency and throughput. We decouple the RNG and key computation phases to hide the latency of RNG and to reduce the critical path in FIFOs, achieving higher operating frequency. We implement the accelerator on an AMD Virtex UltraScale+ FPGA. Both Rubato and HERA achieve a 6x improvement in throughput compared to the software implementation. In terms of latency, Rubato achieves a 5x reduction, while HERA achieves a 3x reduction. Additionally, our hardware implementations reduce energy consumption by 75x for Rubato and 47x for HERA compared to their software implementation.
Related papers
- BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity [66.94629945519125]
We introduce a novel MoE architecture, BlockFFN, as well as its efficient training and deployment techniques.<n>Specifically, we use a router integrating ReLU activation and RMSNorm for differentiable and flexible routing.<n>Next, to promote both token-level sparsity (TLS) and chunk-level sparsity ( CLS), CLS-aware training objectives are designed, making BlockFFN more acceleration-friendly.
arXiv Detail & Related papers (2025-07-11T17:28:56Z) - Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation [129.45368843861917]
We introduce the Gated Memory Unit (GMU), a simple yet effective mechanism for efficient memory sharing across layers.<n>We apply it to create SambaY, a decoder-hybrid-decoder architecture that incorporates GMUs to share memory readout states from a Samba-based self-decoder.
arXiv Detail & Related papers (2025-07-09T07:27:00Z) - Cost-Effective Optimization and Implementation of the CRT-Paillier Decryption Algorithm for Enhanced Performance [0.0]
We propose an eCRT-Paillier decryption algorithm that shortens the decryption computation chain.<n>These two improvements reduce 50% modular multiplications and 60% judgment operations in the postprocessing of the CRT-Paillier decryption algorithm.<n>A high- throughput and efficient Paillier accelerator named MESA was implemented on the Xilinx Virtex-7 FPGA for evaluation.
arXiv Detail & Related papers (2025-06-22T08:06:36Z) - ABC-FHE : A Resource-Efficient Accelerator Enabling Bootstrappable Parameters for Client-Side Fully Homomorphic Encryption [0.8795040582681392]
Homomorphic encryption (FHE) enables continuous computation on encrypted data.<n>Recent advancements in FHE accelerators have successfully improved server-side performance, but client-side computations remain a bottleneck.<n>We propose ABC-FHE, an area- and power-efficient FHE accelerator that supports bootstrappable parameters on the client side.
arXiv Detail & Related papers (2025-06-10T05:37:31Z) - Compile-Time Fully Homomorphic Encryption of Vectors: Eliminating Online Encryption via Algebraic Basis Synthesis [1.3824176915623292]
ciphertexts are constructed from precomputed encrypted basis vectors combined with a runtime-scaled encryption of zero.<n>We formalize the method as a randomized $mathbbZ_t$- module morphism and prove that it satisfies IND-CPA security under standard assumptions.<n>Unlike prior designs that require a pool of random encryptions of zero, our construction achieves equivalent security using a single zero ciphertext multiplied by a fresh scalar at runtime.
arXiv Detail & Related papers (2025-05-19T00:05:18Z) - DDT: Decoupled Diffusion Transformer [51.84206763079382]
Diffusion transformers encode noisy inputs to extract semantic component and decode higher frequency with identical modules.<n>textbfcolorddtDecoupled textbfcolorddtTransformer(textbfcolorddtDDT)<n>textbfcolorddtTransformer(textbfcolorddtDDT)<n>textbfcolorddtTransformer(textbfcolorddtDDT)
arXiv Detail & Related papers (2025-04-08T07:17:45Z) - Hacking Cryptographic Protocols with Tensor Network Attacks [0.0]
We introduce the application of Networks (TN) to launch attacks on symmetric-key cryptography.<n>Our approaches make use of Matrix Product States (MPS) as well as our recently-introduced Flexible-PEPS Quantum Circuit Simulator (FQCS)<n>For small key size, MPS outperforms VQAA and FQCS in both time and average iterations required to recover the key.
arXiv Detail & Related papers (2024-09-06T08:51:31Z) - Taiyi: A high-performance CKKS accelerator for Practical Fully Homomorphic Encryption [9.21642556888646]
A novel cryptographic theory, Fully Homomorphic Encryption, offers significant security benefits but is hampered by substantial performance overhead.
A series of accelerator designs have significantly enhanced the performance of FHE applications, bringing them closer to real-world applicability.
These accelerators face challenges related to large on-chip memory and area.
A comparative evaluation of Taiyi against previous state-of-the-art designs reveals an average performance improvement of 1.5x and reduces the area overhead by 15.7%.
arXiv Detail & Related papers (2024-03-15T10:51:07Z) - SOCI^+: An Enhanced Toolkit for Secure OutsourcedComputation on Integers [50.608828039206365]
We propose SOCI+ which significantly improves the performance of SOCI.
SOCI+ employs a novel (2, 2)-threshold Paillier cryptosystem with fast encryption and decryption as its cryptographic primitive.
Compared with SOCI, our experimental evaluation shows that SOCI+ is up to 5.4 times more efficient in computation and 40% less in communication overhead.
arXiv Detail & Related papers (2023-09-27T05:19:32Z) - GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption [33.87964584665433]
Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it.
FHE introduces a slowdown of up to five orders of magnitude as compared to the same computation using plaintext data.
We propose GME, which combines three key microarchitectural extensions along with a compile-time optimization to the current AMD CDNA GPU architecture.
arXiv Detail & Related papers (2023-09-20T01:50:43Z) - Practical Conformer: Optimizing size, speed and flops of Conformer for
on-Device and cloud ASR [67.63332492134332]
We design an optimized conformer that is small enough to meet on-device restrictions and has fast inference on TPUs.
Our proposed encoder can double as a strong standalone encoder in on device, and as the first part of a high-performance ASR pipeline.
arXiv Detail & Related papers (2023-03-31T23:30:48Z) - LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy
Physics [45.666822327616046]
This work presents a novel reconfigurable architecture for Low Graph Neural Network (LL-GNN) designs for particle detectors.
The LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.
arXiv Detail & Related papers (2022-09-28T12:55:35Z) - Lossless Acceleration for Seq2seq Generation with Aggressive Decoding [74.12096349944497]
Aggressive Decoding is a novel decoding algorithm for seq2seq generation.
Our approach aims to yield identical (or better) generation compared with autoregressive decoding.
We test Aggressive Decoding on the most popular 6-layer Transformer model on GPU in multiple seq2seq tasks.
arXiv Detail & Related papers (2022-05-20T17:59:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.