GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption
- URL: http://arxiv.org/abs/2309.11001v1
- Date: Wed, 20 Sep 2023 01:50:43 GMT
- Title: GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption
- Authors: Kaustubh Shivdikar, Yuhui Bao, Rashmi Agrawal, Michael Shen, Gilbert Jonatan, Evelio Mora, Alexander Ingare, Neal Livesay, José L. Abellán, John Kim, Ajay Joshi, David Kaeli,
- Abstract summary: Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it.
FHE introduces a slowdown of up to five orders of magnitude as compared to the same computation using plaintext data.
We propose GME, which combines three key microarchitectural extensions along with a compile-time optimization to the current AMD CDNA GPU architecture.
- Score: 33.87964584665433
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fully Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it. FHE has garnered significant attention over the past decade as it supports secure outsourcing of data processing to remote cloud services. Despite its promise of strong data privacy and security guarantees, FHE introduces a slowdown of up to five orders of magnitude as compared to the same computation using plaintext data. This overhead is presently a major barrier to the commercial adoption of FHE. In this work, we leverage GPUs to accelerate FHE, capitalizing on a well-established GPU ecosystem available in the cloud. We propose GME, which combines three key microarchitectural extensions along with a compile-time optimization to the current AMD CDNA GPU architecture. First, GME integrates a lightweight on-chip compute unit (CU)-side hierarchical interconnect to retain ciphertext in cache across FHE kernels, thus eliminating redundant memory transactions. Second, to tackle compute bottlenecks, GME introduces special MOD-units that provide native custom hardware support for modular reduction operations, one of the most commonly executed sets of operations in FHE. Third, by integrating the MOD-unit with our novel pipelined $64$-bit integer arithmetic cores (WMAC-units), GME further accelerates FHE workloads by $19\%$. Finally, we propose a Locality-Aware Block Scheduler (LABS) that exploits the temporal locality available in FHE primitive blocks. Incorporating these microarchitectural features and compiler optimizations, we create a synergistic approach achieving average speedups of $796\times$, $14.2\times$, and $2.3\times$ over Intel Xeon CPU, NVIDIA V100 GPU, and Xilinx FPGA implementations, respectively.
Related papers
- Chameleon: An Efficient FHE Scheme Switching Acceleration on GPUs [17.536473118470774]
homomorphic encryption (FHE) enables direct computation on encrypted data.
Existing efforts primarily focus on single-class FHE schemes, which fail to meet the diverse requirements of data types and functions.
We present an efficient GPU-based FHE switching acceleration scheme named Chameleon.
arXiv Detail & Related papers (2024-10-08T11:37:49Z) - vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving [53.972175896814505]
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
arXiv Detail & Related papers (2024-07-22T14:37:58Z) - Cheddar: A Swift Fully Homomorphic Encryption Library for CUDA GPUs [2.613335121517245]
Fully homomorphic encryption (FHE) is a cryptographic technology capable of resolving security and privacy problems in cloud computing by encrypting data in use.
FHE introduces tremendous computational overhead for processing encrypted data, causing FHE workloads to become 2-6 orders of magnitude slower than their unencrypted counterparts.
We propose Cheddar, an FHE library for GPU, which demonstrates significantly faster performance compared to prior GPU implementations.
arXiv Detail & Related papers (2024-07-17T23:49:18Z) - NTTSuite: Number Theoretic Transform Benchmarks for Accelerating Encrypted Computation [2.704681057324485]
Homomorphic encryption (HE) is a cryptographic system that enables computation directly on encrypted data.
HE has seen little adoption due to extremely high computational overheads, rendering it impractical.
We develop a benchmark suite, named NTTSuite, to enable researchers to better address these overheads.
We find our implementation outperforms the state-of-the-art by 30%.
arXiv Detail & Related papers (2024-05-18T17:44:17Z) - FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption [9.884698447131374]
Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption.
FHE is significantly slower than computation on plain data due to the increase in data size after encryption.
We propose a PIM-based FHE accelerator, FHEmem, which exploits a novel processing in-memory architecture.
arXiv Detail & Related papers (2023-11-27T20:11:38Z) - Toward Practical Privacy-Preserving Convolutional Neural Networks Exploiting Fully Homomorphic Encryption [11.706881389387242]
Homomorphic encryption (FHE) is a viable approach for achieving private inference (PI)
FHE implementation of a CNN faces significant hurdles, primarily due to FHE's substantial computational and memory overhead.
We propose a set of optimizations, which includes GPU/ASIC acceleration, an efficient activation function, and an optimized packing scheme.
arXiv Detail & Related papers (2023-10-25T10:24:35Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - SegViTv2: Exploring Efficient and Continual Semantic Segmentation with
Plain Vision Transformers [76.13755422671822]
This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework.
We introduce a novel Attention-to-Mask (atm) module to design a lightweight decoder effective for plain ViT.
Our decoder outperforms the popular decoder UPerNet using various ViT backbones while consuming only about $5%$ of the computational cost.
arXiv Detail & Related papers (2023-06-09T22:29:56Z) - ASH: A Modern Framework for Parallel Spatial Hashing in 3D Perception [91.24236600199542]
ASH is a modern and high-performance framework for parallel spatial hashing on GPU.
ASH achieves higher performance, supports richer functionality, and requires fewer lines of code.
ASH and its example applications are open sourced in Open3D.
arXiv Detail & Related papers (2021-10-01T16:25:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.