Related papers: Cheddar: A Swift Fully Homomorphic Encryption Library Designed for GPU Architectures

Related papers

GPU-Accelerated Algorithms for Graph Vector Search: Taxonomy, Empirical Study, and Research Directions [54.570944939061555]
We present a comprehensive study of GPU-accelerated graph-based vector search algorithms.<n>We establish a detailed taxonomy of GPU optimization strategies and clarify the mapping between algorithmic tasks and hardware execution units.<n>Our findings offer clear guidelines for designing scalable and robust GPU-powered approximate nearest neighbor search systems.
arXiv Detail & Related papers (2026-02-10T16:18:04Z)
Theodosian: A Deep Dive into Memory-Hierarchy-Centric FHE Acceleration [3.8153115302044296]
Fully homomorphic encryption (FHE) enables secure computation on encrypted data, mitigating privacy concerns in cloud and edge environments.<n>In this paper, we perform a microarchitectural analysis of CKKS, a popular FHE scheme, on modern GPUs.<n>We show that the dominant kernels remain bound by memory bandwidth despite a high-bandwidth L2 cache, exposing a persistent memory wall.<n>Motivated by these findings, we introduce Theodosian, a set of complementary, memory-aware optimizations that improve cache efficiency and reduce overheads.
arXiv Detail & Related papers (2025-12-20T12:18:29Z)
A Scalable Multi-GPU Framework for Encrypted Large-Model Inference [5.966282323502589]
Encrypted AI using fully homomorphic encryption (FHE) provides strong privacy guarantees.<n>Recent works proposed ASICs to accelerate FHE, but require expensive advanced manufacturing processes that constrain their computation.<n>This paper presents Cerium, a multi- GPU framework for FHE inference on large models.
arXiv Detail & Related papers (2025-12-12T04:15:38Z)
Minute-Long Videos with Dual Parallelisms [57.22737565366549]
Diffusion Transformer (DiT)-based video diffusion models generate high-quality videos at scale but incur prohibitive processing latency and memory costs for long videos.<n>We propose a novel distributed inference strategy, termed DualParal.<n>Instead of generating an entire video on a single GPU, we parallelize both temporal frames and model layers across GPUs.
arXiv Detail & Related papers (2025-05-27T11:55:22Z)
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float [71.43026659686679]
Large Language Models (LLMs) have grown rapidly in size, creating challenges for efficient deployment on resource-constrained hardware. We introduce Dynamic-Length Float (DFloat11), a compression framework that reduces LLM size by 30% while preserving outputs that are bit-for-bit identical to the original model.
arXiv Detail & Related papers (2025-04-15T22:38:38Z)
CAT: A GPU-Accelerated FHE Framework with Its Application to High-Precision Private Dataset Query [0.51795041186793]
We introduce an open-source GPU-accelerated fully homomorphic encryption (FHE) framework CAT.<n>emphCAT features a three-layer architecture: a foundation of core math, a bridge of pre-computed elements and combined operations, and an API-accessible layer of FHE operators.<n>Based on our framework, we implement three widely used FHE schemes: CKKS, BFV, and BGV.
arXiv Detail & Related papers (2025-03-28T08:20:18Z)
Ramp Up NTT in Record Time using GPU-Accelerated Algorithms and LLM-based Code Generation [11.120838175165986]
Homomorphic encryption (HE) is a core building block in privacy-preserving machine learning (PPML) Many GPU-accelerated cryptographic schemes have been proposed to improve the performance of HE. Given the powerful code generation capabilities of large language models (LLMs), we aim to explore their potential to automatically generate practical GPU-friendly algorithm code.
arXiv Detail & Related papers (2025-02-16T12:53:23Z)
Chameleon: An Efficient FHE Scheme Switching Acceleration on GPUs [17.536473118470774]
homomorphic encryption (FHE) enables direct computation on encrypted data. Existing efforts primarily focus on single-class FHE schemes, which fail to meet the diverse requirements of data types and functions. We present an efficient GPU-based FHE switching acceleration scheme named Chameleon.
arXiv Detail & Related papers (2024-10-08T11:37:49Z)
NTTSuite: Number Theoretic Transform Benchmarks for Accelerating Encrypted Computation [2.704681057324485]
Homomorphic encryption (HE) is a cryptographic system that enables computation directly on encrypted data. HE has seen little adoption due to extremely high computational overheads, rendering it impractical. We develop a benchmark suite, named NTTSuite, to enable researchers to better address these overheads. We find our implementation outperforms the state-of-the-art by 30%.
arXiv Detail & Related papers (2024-05-18T17:44:17Z)
FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption [9.884698447131374]
Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption. FHE is significantly slower than computation on plain data due to the increase in data size after encryption. We propose a PIM-based FHE accelerator, FHEmem, which exploits a novel processing in-memory architecture.
arXiv Detail & Related papers (2023-11-27T20:11:38Z)
GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption [33.87964584665433]
Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it. FHE introduces a slowdown of up to five orders of magnitude as compared to the same computation using plaintext data. We propose GME, which combines three key microarchitectural extensions along with a compile-time optimization to the current AMD CDNA GPU architecture.
arXiv Detail & Related papers (2023-09-20T01:50:43Z)
INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient. We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture. We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z)
ArctyrEX : Accelerated Encrypted Execution of General-Purpose Applications [6.19586646316608]
Fully Homomorphic Encryption (FHE) is a cryptographic method that guarantees the privacy and security of user data during computation. We develop new techniques for accelerated encrypted execution and demonstrate the significant performance advantages of our approach.
arXiv Detail & Related papers (2023-06-19T15:15:41Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
HDCC: A Hyperdimensional Computing compiler for classification on embedded systems and high-performance computing [58.720142291102135]
This work introduces the name compiler, the first open-source compiler that translates high-level descriptions of HDC classification methods into optimized C code. name is designed like a modern compiler, featuring an intuitive and descriptive input language, an intermediate representation (IR), and a retargetable backend. To substantiate these claims, we conducted experiments with HDCC on several of the most popular datasets in the HDC literature.
arXiv Detail & Related papers (2023-04-24T19:16:03Z)
PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine [68.8204255655161]
Support Vector Machines (SVMs) are widely used in machine learning. However, even modern and optimized implementations do not scale well for large non-trivial dense data sets on cutting-edge hardware. PLSSVM can be used as a drop-in replacement for an LVM.
arXiv Detail & Related papers (2022-02-25T13:24:23Z)
ASH: A Modern Framework for Parallel Spatial Hashing in 3D Perception [91.24236600199542]
ASH is a modern and high-performance framework for parallel spatial hashing on GPU. ASH achieves higher performance, supports richer functionality, and requires fewer lines of code. ASH and its example applications are open sourced in Open3D.
arXiv Detail & Related papers (2021-10-01T16:25:40Z)
CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU [8.633428365391666]
CryptGPU is a system for privacy-preserving machine learning that implements all operations on the GPU. We introduce a new interface to embed cryptographic operations over secret-shared values into floating-point operations. We show that our protocols achieve a 2x to 8x improvement in private inference and a 6x to 36x improvement for private training.
arXiv Detail & Related papers (2021-04-22T09:21:40Z)
Efficient and Generic 1D Dilated Convolution Layer for Deep Learning [52.899995651639436]
We introduce our efficient implementation of a generic 1D convolution layer covering a wide range of parameters. It is optimized for x86 CPU architectures, in particular, for architectures containing Intel AVX-512 and AVX-512 BFloat16 instructions. We demonstrate the performance of our optimized 1D convolution layer by utilizing it in the end-to-end neural network training with real genomics datasets.
arXiv Detail & Related papers (2021-04-16T09:54:30Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
Faster Secure Data Mining via Distributed Homomorphic Encryption [108.77460689459247]
Homomorphic Encryption (HE) is receiving more and more attention recently for its capability to do computations over the encrypted field. We propose a novel general distributed HE-based data mining framework towards one step of solving the scaling problem. We verify the efficiency and effectiveness of our new framework by testing over various data mining algorithms and benchmark data-sets.
arXiv Detail & Related papers (2020-06-17T18:14:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.