Related papers: ABC-FHE : A Resource-Efficient Accelerator Enabling Bootstrappable Parameters for Client-Side Fully Homomorphic Encryption

ABC-FHE : A Resource-Efficient Accelerator Enabling Bootstrappable Parameters for Client-Side Fully Homomorphic Encryption

URL: http://arxiv.org/abs/2506.08461v1
Date: Tue, 10 Jun 2025 05:37:31 GMT
Title: ABC-FHE : A Resource-Efficient Accelerator Enabling Bootstrappable Parameters for Client-Side Fully Homomorphic Encryption
Authors: Sungwoong Yune, Hyojeong Lee, Adiwena Putra, Hyunjun Cho, Cuong Duong Manh, Jaeho Jeon, Joo-Young Kim,
Abstract summary: Homomorphic encryption (FHE) enables continuous computation on encrypted data.<n>Recent advancements in FHE accelerators have successfully improved server-side performance, but client-side computations remain a bottleneck.<n>We propose ABC-FHE, an area- and power-efficient FHE accelerator that supports bootstrappable parameters on the client side.
Score: 0.8795040582681392
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As the demand for privacy-preserving computation continues to grow, fully homomorphic encryption (FHE)-which enables continuous computation on encrypted data-has become a critical solution. However, its adoption is hindered by significant computational overhead, requiring 10000-fold more computation compared to plaintext processing. Recent advancements in FHE accelerators have successfully improved server-side performance, but client-side computations remain a bottleneck, particularly under bootstrappable parameter configurations, which involve combinations of encoding, encrypt, decoding, and decrypt for large-sized parameters. To address this challenge, we propose ABC-FHE, an area- and power-efficient FHE accelerator that supports bootstrappable parameters on the client side. ABC-FHE employs a streaming architecture to maximize performance density, minimize area usage, and reduce off-chip memory access. Key innovations include a reconfigurable Fourier engine capable of switching between NTT and FFT modes. Additionally, an on-chip pseudo-random number generator and a unified on-the-fly twiddle factor generator significantly reduce memory demands, while optimized task scheduling enhances the CKKS client-side processing, achieving reduced latency. Overall, ABC-FHE occupies a die area of 28.638 mm2 and consumes 5.654 W of power in 28 nm technology. It delivers significant performance improvements, achieving a 1112x speed-up in encoding and encryption execution time compared to a CPU, and 214x over the state-of-the-art client-side accelerator. For decoding and decryption, it achieves a 963x speed-up over the CPU and 82x over the state-of-the-art accelerator.

Related papers

Cost-Effective Optimization and Implementation of the CRT-Paillier Decryption Algorithm for Enhanced Performance [0.0]
We propose an eCRT-Paillier decryption algorithm that shortens the decryption computation chain.<n>These two improvements reduce 50% modular multiplications and 60% judgment operations in the postprocessing of the CRT-Paillier decryption algorithm.<n>A high- throughput and efficient Paillier accelerator named MESA was implemented on the Xilinx Virtex-7 FPGA for evaluation.
arXiv Detail & Related papers (2025-06-22T08:06:36Z)
Toward a Lightweight, Scalable, and Parallel Secure Encryption Engine [0.0]
SPiME is a lightweight, scalable, and FPGA-compatible Secure Processor-in-Memory Encryption architecture.<n>It integrates the Advanced Encryption Standard (AES-128) directly into a Processing-in-Memory framework.<n>It delivers over 25Gbps in sustained encryption throughput with predictable, low-latency performance.
arXiv Detail & Related papers (2025-06-18T02:25:04Z)
GDNTT: an Area-Efficient Parallel NTT Accelerator Using Glitch-Driven Near-Memory Computing and Reconfigurable 10T SRAM [14.319119105134309]
This paper proposes an area-efficient highly parallel NTT accelerator with glitch-driven near-memory computing (GDNTT)<n>The design integrates a 10T for data storage, enabling flexible row/column data access and streamlining circuit mapping strategies.<n> Evaluation results show that the proposed NTT accelerator achieves a 1.528* improvement in throughput-per-area compared to the state-of-the-art.
arXiv Detail & Related papers (2025-05-13T01:53:07Z)
Real-Time Bit-Level Encryption of Full High-Definition Video Without Diffusion [1.0202696337641386]
Real-time encryption algorithms are inadequate to meet the demands of real-time encryption for high-resolution video.<n>This paper proposes a real-time video encryption protocol based on heterogeneous parallel computing.<n>Experiments show that our approach exhibits superior statistical properties and robust security.
arXiv Detail & Related papers (2025-05-12T00:36:45Z)
Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference [49.77734021302196]
We propose a task-oriented feature compression (TOFC) method for multimodal understanding in a device-edge co-inference framework.<n>To enhance compression efficiency, multiple entropy models are adaptively selected based on the characteristics of the visual features.<n>Results show that TOFC achieves up to 60% reduction in data transmission overhead and 50% reduction in system latency.
arXiv Detail & Related papers (2025-03-17T08:37:22Z)
Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders. We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z)
Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse [56.384390765357004]
We propose an integrated federated split learning and hyperdimensional computing framework for emerging foundation models. This novel approach reduces communication costs, computation load, and privacy risks, making it suitable for resource-constrained edge devices in the Metaverse.
arXiv Detail & Related papers (2024-08-26T17:03:14Z)
OFHE: An Electro-Optical Accelerator for Discretized TFHE [14.192223933448174]
textitOFHE is an electro-optical accelerator designed to process Discretized TFHE (DTFHE) operations. DTFHE is more efficient and versatile than other fully homomorphic encryption schemes. Existing TFHE accelerators are not easily upgradable to support DTFHE operations.
arXiv Detail & Related papers (2024-05-19T16:27:21Z)
Data-Oblivious ML Accelerators using Hardware Security Extensions [9.716425897388875]
Outsourced computation can put client data confidentiality at risk. We develop Dolma, which applies DIFT to the Gemmini matrix multiplication accelerator. We show that Dolma incurs low overheads for large configurations.
arXiv Detail & Related papers (2024-01-29T21:34:29Z)
FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption [9.884698447131374]
Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption. FHE is significantly slower than computation on plain data due to the increase in data size after encryption. We propose a PIM-based FHE accelerator, FHEmem, which exploits a novel processing in-memory architecture.
arXiv Detail & Related papers (2023-11-27T20:11:38Z)
Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks. The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources. This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.