Related papers: Osiris: A Systolic Approach to Accelerating Fully Homomorphic Encryption

Osiris: A Systolic Approach to Accelerating Fully Homomorphic Encryption

URL: http://arxiv.org/abs/2408.09593v1
Date: Sun, 18 Aug 2024 20:58:54 GMT
Title: Osiris: A Systolic Approach to Accelerating Fully Homomorphic Encryption
Authors: Austin Ebel, Brandon Reagen,
Abstract summary: We show how fully homomorphic encryption (FHE) can be accelerated using a systolic architecture. We propose a new data tiling technique that we name limb interleaving. Our evaluation of Osiris shows it outperforms the prior state-of-the-art accelerator on all standard benchmarks.
Score: 3.16990548935142
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we show how fully homomorphic encryption (FHE) can be accelerated using a systolic architecture. We begin by analyzing FHE algorithms and then develop systolic or systolic-esque units for each major kernel. Connecting units is challenging due to the different data access and computational patterns of the kernels. We overcome this by proposing a new data tiling technique that we name limb interleaving. Limb interleaving creates a common data input/output pattern across all kernels that allows the entire architecture, named Osiris, to operate in lockstep. Osiris is capable of processing key-switches, bootstrapping, and full neural network inferences with high utilization across a range of FHE parameters. To achieve high performance, we propose a new giant-step centric (GSC) dataflow that efficiently maps state-of-the-art FHE matrix-vector product algorithms onto Osiris by optimizing for reuse and parallelism. Our evaluation of Osiris shows it outperforms the prior state-of-the-art accelerator on all standard benchmarks.

Related papers

Eliminating Multi-GPU Performance Taxes: A Systems Approach to Efficient Distributed LLMs [61.953548065938385]
We introduce the ''Three Taxes'' (Bulk Synchronous, Inter- Kernel Data Locality, and Kernel Launch Overhead) as an analytical framework.<n>We propose moving beyond the rigid BSP model to address key inefficiencies in distributed GPU execution.<n>We observe a 10-20% speedup in end-to-end latency over BSP-based approaches.
arXiv Detail & Related papers (2025-11-04T01:15:44Z)
Towards a Functionally Complete and Parameterizable TFHE Processor [3.907410857035328]
TFHE is a fast torus-based fully homomorphic encryption scheme.<n>It provides the fastest bootstrapping operation performance of any other FHE scheme.<n>It suffers from a considerably higher computational overhead for the evaluation of homomorphic circuits.<n>We propose an FPGA-based hardware accelerator for the evaluation of homomorphic circuits.
arXiv Detail & Related papers (2025-10-27T16:16:40Z)
Quantum Spectral Clustering: Comparing Parameterized and Neuromorphic Quantum Kernels [0.0]
We compare a parameterized quantum kernel (pQK) with a quantum leaky integrate-and-fire (QLIF) neuromorphic computing approach.<n>For the synthetic datasets and textttIris, the QLIF kernel typically achieves better classification and clustering performance than pQK.
arXiv Detail & Related papers (2025-07-09T16:46:49Z)
Accelerating Machine Learning Primitives on Commodity Hardware [0.0]
We present an extensive study of the Sliding Window convolution technique as a more efficient alternative to the commonly used General Matrix multiplication (GEMM) based convolution in Deep Neural Networks (DNNs) Our results suggest that the Sliding Window computation kernels can outperform GEMM-based convolution on a CPU and even on dedicated hardware accelerators. This could promote a wider adoption of AI on low-power and low-memory devices without the need for specialized hardware.
arXiv Detail & Related papers (2023-10-08T16:26:18Z)
INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient. We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture. We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z)
PACE: A Parallelizable Computation Encoder for Directed Acyclic Graphs [30.294095901315746]
We propose a Parallelizable Attention-based structure (PACE) that processes nodes simultaneously and encodes DAGs in parallel. PACE not only improves the effectiveness over previous sequential DAG encoders with a significantly boosted training and inference speed, but also generates smooth latent (DAG encoding) spaces.
arXiv Detail & Related papers (2022-03-19T11:56:51Z)
Are we ready for beyond-application high-volume data? The Reeds robot perception benchmark dataset [3.781421673607643]
This paper presents a dataset, called Reeds, for research on robot perception algorithms. The dataset aims to provide demanding benchmark opportunities for algorithms, rather than providing an environment for testing application-specific solutions.
arXiv Detail & Related papers (2021-09-16T23:21:42Z)
Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z)
FuSeConv: Fully Separable Convolutions for Fast Inference on Systolic Arrays [2.8583189395674653]
We propose FuSeConv as a drop-in replacement for depth-wise separable convolution. FuSeConv generalizes the decomposition of convolutions fully to separable 1D convolutions along spatial and depth dimensions. We achieve a significant speed-up of 3x-7x with the MobileNet family of networks on a systolic array of size 64x64, with comparable accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-05-27T20:19:39Z)
Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network. We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z)
Fully Convolutional Networks for Panoptic Segmentation [91.84686839549488]
We present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN. Our approach aims to represent and predict foreground things and background stuff in a unified fully convolutional pipeline. Panoptic FCN encodes each object instance or stuff category into a specific kernel weight with the proposed kernel generator.
arXiv Detail & Related papers (2020-12-01T18:31:41Z)
Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data. We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z)
Are Gabor Kernels Optimal for Iris Recognition? [4.658023970671232]
Gabor kernels are widely accepted as dominant filters for iris recognition. We learn data-driven kernels that can be easily transplanted into open-source iris recognition software.
arXiv Detail & Related papers (2020-02-20T17:51:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.