Related papers: Towards a Functionally Complete and Parameterizable TFHE Processor

Towards a Functionally Complete and Parameterizable TFHE Processor

URL: http://arxiv.org/abs/2510.23483v1
Date: Mon, 27 Oct 2025 16:16:40 GMT
Title: Towards a Functionally Complete and Parameterizable TFHE Processor
Authors: Valentin Reyes Häusler, Gabriel Ott, Aruna Jayasena, Andreas Peter,
Abstract summary: TFHE is a fast torus-based fully homomorphic encryption scheme.<n>It provides the fastest bootstrapping operation performance of any other FHE scheme.<n>It suffers from a considerably higher computational overhead for the evaluation of homomorphic circuits.<n>We propose an FPGA-based hardware accelerator for the evaluation of homomorphic circuits.
Score: 3.907410857035328
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fully homomorphic encryption allows the evaluation of arbitrary functions on encrypted data. It can be leveraged to secure outsourced and multiparty computation. TFHE is a fast torus-based fully homomorphic encryption scheme that allows both linear operations, as well as the evaluation of arbitrary non-linear functions. It currently provides the fastest bootstrapping operation performance of any other FHE scheme. Despite its fast performance, TFHE suffers from a considerably higher computational overhead for the evaluation of homomorphic circuits. Computations in the encrypted domain are orders of magnitude slower than their unencrypted equivalents. This bottleneck hinders the widespread adoption of (T)FHE for the protection of sensitive data. While state-of-the-art implementations focused on accelerating and outsourcing single operations, their scalability and practicality are constrained by high memory bandwidth costs. In order to overcome this, we propose an FPGA-based hardware accelerator for the evaluation of homomorphic circuits. Specifically, we design a functionally complete TFHE processor for FPGA hardware capable of processing instructions on the data completely on the FPGA. In order to achieve a higher throughput from our TFHE processor, we implement an improved programmable bootstrapping module which outperforms the current state-of-the-art by 240\% to 480\% more bootstrappings per second. Our efficient, compact, and scalable design lays the foundation for implementing complete FPGA-based TFHE processor architectures.

Related papers

FAME: FPGA Acceleration of Secure Matrix Multiplication with Homomorphic Encryption [11.342625695057281]
Homomorphic Encryption (HE) enables secure computation on encrypted data, addressing privacy concerns in cloud computing.<n>Accelerating homomorphic encrypted MM (HE MM) is therefore crucial for applications such as privacy-preserving machine learning.<n>We present a bandwidth-efficient FPGA implementation of HE MM.<n>FAME is the first FPGA-based accelerator specifically tailored for HE MM.
arXiv Detail & Related papers (2025-12-17T15:09:36Z)
A Scalable Architecture for Efficient Multi-bit Fully Homomorphic Encryption [1.4174227043241145]
We introduce Taurus, a hardware accelerator designed to enhance the efficiency of multi-bit TFHE computations.<n>Taurus supports ciphertexts up to 10 bits by leveraging novel FFT units and optimizing memory bandwidth through key reuse strategies.<n>Our experiment results demonstrate that Taurus achieves up to 2600x speedup over a CPU, 1200x speedup over a GPU, and up to 7x faster compared to the previous state-of-the-art accelerator.
arXiv Detail & Related papers (2025-09-16T05:00:57Z)
Cut Tracing with E-Graphs for Boolean FHE Circuit Synthesis [1.9204566034368082]
Homomorphic Encryption (FHE) is a promising privacy-preserving technology enabling secure computation over encrypted data.<n>Existing works primarily target the multiplicative depth (MD) and multiplicative complexity (MC) of FHE circuits.<n>We show how cut tracing can effectively combine two state-of-the-art MC and MD reduction flows and balance their weaknesses to minimize runtime.
arXiv Detail & Related papers (2025-06-15T15:27:51Z)
Fast correlated decoding of transversal logical algorithms [67.01652927671279]
Quantum error correction (QEC) is required for large-scale computation, but incurs a significant resource overhead.<n>Recent advances have shown that by jointly decoding logical qubits in algorithms composed of logical gates, the number of syndrome extraction rounds can be reduced.<n>Here, we reform the problem of decoding circuits by directly decoding relevant logical operator products as they propagate through the circuit.
arXiv Detail & Related papers (2025-05-19T18:00:00Z)
EFFACT: A Highly Efficient Full-Stack FHE Acceleration Platform [15.3973190088728]
EFFACT is a highly efficient full-stack FHE acceleration platform with a compiler that provides comprehensive optimizations and vector-friendly hardware.<n>For generality, EFFACT is also equipped with an ISA and a compiler backend that can support several FHE schemes like CKKS, BGV, and BFV.
arXiv Detail & Related papers (2025-04-22T12:01:20Z)
Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA [20.629635991749808]
This paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs. At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads. At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient BayesNNs.
arXiv Detail & Related papers (2024-06-20T17:08:42Z)
Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation [67.85309547416155]
A powerful architecture for universal segmentation relies on transformers that encode multi-scale image features and decode object queries into mask predictions.<n>With efficiency being a high priority for scaling such models, we observed that the state-of-the-art method Mask2Former uses 50% of its compute only on the transformer encoder.<n>This is due to the retention of a full-length token-level representation of all backbone feature scales at each encoder layer.
arXiv Detail & Related papers (2024-04-23T01:34:20Z)
FHEmem: A Processing In-Memory Accelerator for Fully Homomorphic Encryption [9.884698447131374]
Homomorphic Encryption (FHE) is a technique that allows arbitrary computations to be performed on encrypted data without the need for decryption. FHE is significantly slower than computation on plain data due to the increase in data size after encryption. We propose a PIM-based FHE accelerator, FHEmem, which exploits a novel processing in-memory architecture.
arXiv Detail & Related papers (2023-11-27T20:11:38Z)
INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient. We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture. We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR [67.63332492134332]
We design an optimized conformer that is small enough to meet on-device restrictions and has fast inference on TPUs. Our proposed encoder can double as a strong standalone encoder in on device, and as the first part of a high-performance ASR pipeline.
arXiv Detail & Related papers (2023-03-31T23:30:48Z)
Providing Meaningful Data Summarizations Using Examplar-based Clustering in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms. We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z)
A fully pipelined FPGA accelerator for scale invariant feature transform keypoint descriptor matching, [0.0]
We design a novel fully pipelined hardware accelerator architecture for SIFT keypoint descriptor matching. The proposed hardware architecture is able to properly handle the memory bandwidth necessary for a fully-pipelined implementation. Our hardware implementation is 15.7 times faster than the comparable software approach.
arXiv Detail & Related papers (2020-12-17T15:29:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.