Related papers: EFFACT: A Highly Efficient Full-Stack FHE Acceleration Platform

EFFACT: A Highly Efficient Full-Stack FHE Acceleration Platform

URL: http://arxiv.org/abs/2504.15817v1
Date: Tue, 22 Apr 2025 12:01:20 GMT
Title: EFFACT: A Highly Efficient Full-Stack FHE Acceleration Platform
Authors: Yi Huang, Xinsheng Gong, Xiangyu Kong, Dibei Chen, Jianfeng Zhu, Wenping Zhu, Liangwei Li, Mingyu Gao, Shaojun Wei, Aoyang Zhang, Leibo Liu,
Abstract summary: EFFACT is a highly efficient full-stack FHE acceleration platform with a compiler that provides comprehensive optimizations and vector-friendly hardware.<n>For generality, EFFACT is also equipped with an ISA and a compiler backend that can support several FHE schemes like CKKS, BGV, and BFV.
Score: 15.3973190088728
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fully Homomorphic Encryption (FHE) is a set of powerful cryptographic schemes that allows computation to be performed directly on encrypted data with an unlimited depth. Despite FHE's promising in privacy-preserving computing, yet in most FHE schemes, ciphertext generally blows up thousands of times compared to the original message, and the massive amount of data load from off-chip memory for bootstrapping and privacy-preserving machine learning applications (such as HELR, ResNet-20), both degrade the performance of FHE-based computation. Several hardware designs have been proposed to address this issue, however, most of them require enormous resources and power. An acceleration platform with easy programmability, high efficiency, and low overhead is a prerequisite for practical application. This paper proposes EFFACT, a highly efficient full-stack FHE acceleration platform with a compiler that provides comprehensive optimizations and vector-friendly hardware. We start by examining the computational overhead across different real-world benchmarks to highlight the potential benefits of reallocating computing resources for efficiency enhancement. Then we make a design space exploration to find an optimal SRAM size with high utilization and low cost. On the other hand, EFFACT features a novel optimization named streaming memory access which is proposed to enable high throughput with limited SRAMs. Regarding the software-side optimization, we also propose a circuit-level function unit reuse scheme, to substantially reduce the computing resources without performance degradation. Moreover, we design novel NTT and automorphism units that are suitable for a cost-sensitive and highly efficient architecture, leading to low area. For generality, EFFACT is also equipped with an ISA and a compiler backend that can support several FHE schemes like CKKS, BGV, and BFV.

Related papers

Towards a Functionally Complete and Parameterizable TFHE Processor [3.907410857035328]
TFHE is a fast torus-based fully homomorphic encryption scheme.<n>It provides the fastest bootstrapping operation performance of any other FHE scheme.<n>It suffers from a considerably higher computational overhead for the evaluation of homomorphic circuits.<n>We propose an FPGA-based hardware accelerator for the evaluation of homomorphic circuits.
arXiv Detail & Related papers (2025-10-27T16:16:40Z)
HHEML: Hybrid Homomorphic Encryption for Privacy-Preserving Machine Learning on Edge [4.623074730653607]
Privacy-preserving machine learning (PPML) is an emerging topic to handle secure machine learning inference over sensitive data in untrusted environments.<n>Fully homomorphic encryption (FHE) enables computation directly on encrypted data on the server side, but it introduces significant communication and computation overhead on the client side.<n>HHE addresses this limitation by combining symmetric encryption (SE) with FHE to reduce the computational cost on the client side.<n>Our work proposes a hardware-accelerated HHE architecture built around a lightweight symmetric cipher optimized for FHE compatibility.
arXiv Detail & Related papers (2025-10-23T05:51:55Z)
A Scalable Architecture for Efficient Multi-bit Fully Homomorphic Encryption [1.4174227043241145]
We introduce Taurus, a hardware accelerator designed to enhance the efficiency of multi-bit TFHE computations.<n>Taurus supports ciphertexts up to 10 bits by leveraging novel FFT units and optimizing memory bandwidth through key reuse strategies.<n>Our experiment results demonstrate that Taurus achieves up to 2600x speedup over a CPU, 1200x speedup over a GPU, and up to 7x faster compared to the previous state-of-the-art accelerator.
arXiv Detail & Related papers (2025-09-16T05:00:57Z)
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs [81.01082659623552]
Large Language Models (LLMs) have demonstrated remarkable success across various domains.<n>Their optimization remains a significant challenge due to the complex and high-dimensional loss landscapes they inhabit.
arXiv Detail & Related papers (2025-02-24T18:42:19Z)
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores [3.6385567224218556]
Large language models (LLMs) have been widely applied but face challenges in efficient inference. We introduce a novel bipolar-INT data format that facilitates parallel computing and supports symmetric quantization. We implement an arbitrary precision matrix multiplication scheme that decomposes and recovers at the bit level, enabling flexible precision.
arXiv Detail & Related papers (2024-09-26T14:17:58Z)
Search for Efficient Large Language Models [52.98684997131108]
Large Language Models (LLMs) have long held sway in the realms of artificial intelligence research. Weight pruning, quantization, and distillation have been embraced to compress LLMs, targeting memory reduction and inference acceleration. Most model compression techniques concentrate on weight optimization, overlooking the exploration of optimal architectures.
arXiv Detail & Related papers (2024-09-25T21:32:12Z)
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference [2.9302211589186244]
Large language models (LLMs) have transformed natural language processing, enabling machines to generate human-like text and engage in meaningful conversations. Developments in computing and memory capabilities are lagging behind, exacerbated by the discontinuation of Moore's law. compute-in-memory (CIM) technologies offer a promising solution for accelerating AI inference by directly performing analog computations in memory.
arXiv Detail & Related papers (2024-06-12T16:57:58Z)
SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs [3.302913401404089]
Sliding window-based static sparse attention mitigates the problem by limiting the attention scope of the input tokens. We propose a dataflow-aware FPGA-based accelerator design, SWAT, that efficiently leverages the sparsity to achieve scalable performance for long input.
arXiv Detail & Related papers (2024-05-27T10:25:08Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark [166.40879020706151]
This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during fine-tuning. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.
arXiv Detail & Related papers (2024-02-18T14:08:48Z)
FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption [9.884698447131374]
Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption. FHE is significantly slower than computation on plain data due to the increase in data size after encryption. We propose a PIM-based FHE accelerator, FHEmem, which exploits a novel processing in-memory architecture.
arXiv Detail & Related papers (2023-11-27T20:11:38Z)
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU. This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.