Related papers: Cut Tracing with E-Graphs for Boolean FHE Circuit Synthesis

Cut Tracing with E-Graphs for Boolean FHE Circuit Synthesis

URL: http://arxiv.org/abs/2506.12883v1
Date: Sun, 15 Jun 2025 15:27:51 GMT
Title: Cut Tracing with E-Graphs for Boolean FHE Circuit Synthesis
Authors: Julien de Castelnau, Mingfei Yu, Giovanni De Micheli,
Abstract summary: Homomorphic Encryption (FHE) is a promising privacy-preserving technology enabling secure computation over encrypted data.<n>Existing works primarily target the multiplicative depth (MD) and multiplicative complexity (MC) of FHE circuits.<n>We show how cut tracing can effectively combine two state-of-the-art MC and MD reduction flows and balance their weaknesses to minimize runtime.
Score: 1.9204566034368082
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fully Homomorphic Encryption (FHE) is a promising privacy-preserving technology enabling secure computation over encrypted data. A major limitation of current FHE schemes is their high runtime overhead. As a result, automatic optimization of circuits describing FHE computation has garnered significant attention in the logic synthesis community. Existing works primarily target the multiplicative depth (MD) and multiplicative complexity (MC) of FHE circuits, corresponding to the total number of multiplications and maximum number of multiplications in a path from primary input to output, respectively. In many FHE schemes, these metrics are the primary contributors to the homomorphic evaluation runtime of a circuit. However, oftentimes they are opposed: reducing either depth or complexity may result in an increase in the other. To our knowledge, existing works have yet to optimize FHE circuits for overall runtime, only considering one metric at a time and thus making significant tradeoffs. In this paper, we use e-graphs to augment existing flows that individually optimize MC and MD, in a technique called cut tracing. We show how cut tracing can effectively combine two state-of-the-art MC and MD reduction flows and balance their weaknesses to minimize runtime. Our preliminary results demonstrate that cut tracing yields up to a 40% improvement in homomorphic evaluation runtime when applied to these two flows.

Related papers

Towards a Functionally Complete and Parameterizable TFHE Processor [3.907410857035328]
TFHE is a fast torus-based fully homomorphic encryption scheme.<n>It provides the fastest bootstrapping operation performance of any other FHE scheme.<n>It suffers from a considerably higher computational overhead for the evaluation of homomorphic circuits.<n>We propose an FPGA-based hardware accelerator for the evaluation of homomorphic circuits.
arXiv Detail & Related papers (2025-10-27T16:16:40Z)
DeepPrune: Parallel Scaling without Inter-trace Redundancy [53.62015294143274]
Over 80% of parallel reasoning traces yield identical final answers, representing substantial wasted computation.<n>We propose DeepPrune, a novel framework that enables efficient parallel scaling through dynamic pruning.<n>Our work establishes a new standard for efficient parallel reasoning, making high-performance reasoning more efficient.
arXiv Detail & Related papers (2025-10-09T17:24:54Z)
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding [40.96405124314983]
Diffusion LLMs (dLLMs) have recently emerged as a powerful alternative to autoregressive LLMs (AR-LLMs)<n>Currently available open-source dLLMs often generate at much lower rates, typically decoding only a single token at every denoising timestep.<n>We present Spiffy, a speculative decoding algorithm that accelerates dLLM inference by $mathbf2.8-3.1times$ while provably preserving the model's output distribution.
arXiv Detail & Related papers (2025-09-22T17:58:21Z)
ELF: Efficient Logic Synthesis by Pruning Redundancy in Refactoring [15.62205696947912]
We propose a technique to prune unsuccessful cuts preemptively, thus eliminating unnecessary resynthesis operations.<n> Experiments on the operator using the EPFL benchmark suite and 10 large industrial designs demonstrate that this technique can speedup logic optimization by 3.9x on average compared with the state-of-the-art ABC implementation.
arXiv Detail & Related papers (2025-08-11T15:18:07Z)
Scaling Probabilistic Circuits via Monarch Matrices [109.65822339230853]
Probabilistic Circuits (PCs) are tractable representations of probability distributions.<n>We propose a novel sparse and structured parameterization for the sum blocks in PCs.
arXiv Detail & Related papers (2025-06-14T07:39:15Z)
Fast correlated decoding of transversal logical algorithms [67.01652927671279]
Quantum error correction (QEC) is required for large-scale computation, but incurs a significant resource overhead.<n>Recent advances have shown that by jointly decoding logical qubits in algorithms composed of logical gates, the number of syndrome extraction rounds can be reduced.<n>Here, we reform the problem of decoding circuits by directly decoding relevant logical operator products as they propagate through the circuit.
arXiv Detail & Related papers (2025-05-19T18:00:00Z)
Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery.<n>Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods.<n>Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
Accelerating Matrix Factorization by Dynamic Pruning for Fast Recommendation [0.49399484784577985]
Matrix factorization (MF) is a widely used collaborative filtering algorithm for recommendation systems (RSs) With the dramatically increased number of users/items in current RSs, the computational complexity for training a MF model largely increases. We propose algorithmic methods to accelerate MF, without inducing any additional computational resources.
arXiv Detail & Related papers (2024-03-18T16:27:33Z)
ReLU and Addition-based Gated RNN [1.484528358552186]
We replace the multiplication and sigmoid function of the conventional recurrent gate with addition and ReLU activation. This mechanism is designed to maintain long-term memory for sequence processing but at a reduced computational cost.
arXiv Detail & Related papers (2023-08-10T15:18:16Z)
Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning [5.41530201129053]
This paper proposes a novel memorization-based inference (MBI) that is compute free and only requires lookups. Specifically, our work capitalizes on the inference mechanism of the recurrent attention model (RAM) By leveraging the low-dimensionality of glimpse, our inference procedure stores key value pairs comprising of glimpse location, patch vector, etc. in a table. The computations are obviated during inference by utilizing the table to read out key-value pairs and performing compute-free inference by memorization.
arXiv Detail & Related papers (2023-07-14T21:01:59Z)
General Cutting Planes for Bound-Propagation-Based Neural Network Verification [144.7290035694459]
We generalize the bound propagation procedure to allow the addition of arbitrary cutting plane constraints. We find that MIP solvers can generate high-quality cutting planes for strengthening bound-propagation-based verifiers. Our method is the first verifier that can completely solve the oval20 benchmark and verify twice as many instances on the oval21 benchmark.
arXiv Detail & Related papers (2022-08-11T10:31:28Z)
Optimal qubit assignment and routing via integer programming [0.22940141855172028]
We consider the problem of mapping a logical quantum circuit onto a given hardware with limited two-qubit connectivity. We model this problem as an integer linear program, using a network flow formulation with binary variables. We consider several cost functions: an approximation of the fidelity of the circuit, its total depth, and a measure of cross-talk.
arXiv Detail & Related papers (2021-06-11T15:02:26Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.