Related papers: FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption

FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption

URL: http://arxiv.org/abs/2311.16293v1
Date: Mon, 27 Nov 2023 20:11:38 GMT
Title: FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption
Authors: Minxuan Zhou, Yujin Nam, Pranav Gangwar, Weihong Xu, Arpan Dutta, Kartikeyan Subramanyam, Chris Wilkerson, Rosario Cammarota, Saransh Gupta, Tajana Rosing,
Abstract summary: Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption. FHE is significantly slower than computation on plain data due to the increase in data size after encryption. We propose a PIM-based FHE accelerator, FHEmem, which exploits a novel processing in-memory architecture.
Score: 9.884698447131374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fully Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption, making it ideal for securing many emerging applications. However, FHE computation is significantly slower than computation on plain data due to the increase in data size after encryption. Processing In-Memory (PIM) is a promising technology that can accelerate data-intensive workloads with extensive parallelism. However, FHE is challenging for PIM acceleration due to the long-bitwidth multiplications and complex data movements involved. We propose a PIM-based FHE accelerator, FHEmem, which exploits a novel processing in-memory architecture to achieve high-throughput and efficient acceleration for FHE. We propose an optimized end-to-end processing flow, from low-level hardware processing to high-level application mapping, that fully exploits the high throughput of FHEmem hardware. Our evaluation shows FHEmem achieves significant speedup and efficiency improvement over state-of-the-art FHE accelerators.

Related papers

Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation [129.45368843861917]
We introduce the Gated Memory Unit (GMU), a simple yet effective mechanism for efficient memory sharing across layers.<n>We apply it to create SambaY, a decoder-hybrid-decoder architecture that incorporates GMUs to share memory readout states from a Samba-based self-decoder.
arXiv Detail & Related papers (2025-07-09T07:27:00Z)
ABC-FHE : A Resource-Efficient Accelerator Enabling Bootstrappable Parameters for Client-Side Fully Homomorphic Encryption [0.8795040582681392]
Homomorphic encryption (FHE) enables continuous computation on encrypted data.<n>Recent advancements in FHE accelerators have successfully improved server-side performance, but client-side computations remain a bottleneck.<n>We propose ABC-FHE, an area- and power-efficient FHE accelerator that supports bootstrappable parameters on the client side.
arXiv Detail & Related papers (2025-06-10T05:37:31Z)
EFFACT: A Highly Efficient Full-Stack FHE Acceleration Platform [15.3973190088728]
EFFACT is a highly efficient full-stack FHE acceleration platform with a compiler that provides comprehensive optimizations and vector-friendly hardware. For generality, EFFACT is also equipped with an ISA and a compiler backend that can support several FHE schemes like CKKS, BGV, and BFV.
arXiv Detail & Related papers (2025-04-22T12:01:20Z)
Evaluating the Potential of In-Memory Processing to Accelerate Homomorphic Encryption [1.5707609236065612]
homomorphic encryption (HE) allows computation without the need for decryption. The high computational and memory overhead associated with the underlying cryptographic operations has hindered the practicality of HE-based solutions. processing in-memory (PIM) presents a promising solution to this problem by bringing computation closer to data, thereby reducing the overhead resulting from processor-memory data movements.
arXiv Detail & Related papers (2024-12-12T10:28:58Z)
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores [3.6385567224218556]
Large language models (LLMs) have been widely applied but face challenges in efficient inference. We introduce a novel bipolar-INT data format that facilitates parallel computing and supports symmetric quantization. We implement an arbitrary precision matrix multiplication scheme that decomposes and recovers at the bit level, enabling flexible precision.
arXiv Detail & Related papers (2024-09-26T14:17:58Z)
PhD Forum: Efficient Privacy-Preserving Processing via Memory-Centric Computing [0.0]
Homomorphic encryption (HE) and secure multi-party computation (SMPC) enhance data security by enabling processing on encrypted data. Existing approaches focus on improving computational overhead using specialized hardware. We propose a framework that uses recently available PIM hardware to achieve efficient privacy-preserving computation.
arXiv Detail & Related papers (2024-09-25T09:37:50Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System [21.09681871279162]
Modern Machine Learning (ML) training on large-scale datasets is a time-consuming workload. It relies on the optimization algorithm Gradient Descent (SGD) due to its effectiveness, simplicity, and generalization performance. processor-centric architectures suffer from low performance and high energy consumption while executing ML training workloads. Processing-In-Memory (PIM) is a promising solution to alleviate the data movement bottleneck.
arXiv Detail & Related papers (2024-04-10T17:00:04Z)
CiFlow: Dataflow Analysis and Optimization of Key Switching for Homomorphic Encryption [2.704681057324485]
Homomorphic encryption (HE) is a privacy-preserving computation technique that enables computation on encrypted data. HE is impractically slow, preventing it from being used in real applications. We present a novel approach to improve HE performance by rigorously analyzing its dataflow.
arXiv Detail & Related papers (2023-11-02T21:08:56Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Providing Meaningful Data Summarizations Using Examplar-based Clustering in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms. We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z)
Making Online Sketching Hashing Even Faster [63.16042585506435]
We present a FasteR Online Sketching Hashing (FROSH) algorithm to sketch the data in a more compact form via an independent transformation. We provide theoretical justification to guarantee that our proposed FROSH consumes less time and achieves a comparable sketching precision. We also extend FROSH to its distributed implementation, namely DFROSH, to further reduce the training time cost of FROSH.
arXiv Detail & Related papers (2020-10-10T08:50:53Z)
FPGA-Based Hardware Accelerator of Homomorphic Encryption for Efficient Federated Learning [9.733675923979108]
Federated learning tends to utilize various privacy preserving mechanisms to protect the transferred intermediate data. Maintaining accuracy and security more efficiently has been a key problem of federated learning. Our framework implements the representative Paillier homomorphic cryptosystem with high level synthesis for flexibility and portability.
arXiv Detail & Related papers (2020-07-21T01:59:58Z)
Faster Secure Data Mining via Distributed Homomorphic Encryption [108.77460689459247]
Homomorphic Encryption (HE) is receiving more and more attention recently for its capability to do computations over the encrypted field. We propose a novel general distributed HE-based data mining framework towards one step of solving the scaling problem. We verify the efficiency and effectiveness of our new framework by testing over various data mining algorithms and benchmark data-sets.
arXiv Detail & Related papers (2020-06-17T18:14:30Z)
One-step regression and classification with crosspoint resistive memory arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge. One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition. Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.