Related papers: PAPER: Privacy-Preserving ResNet Models using Low-Degree Polynomial Approximations and Structural Optimizations on Leveled FHE

PAPER: Privacy-Preserving ResNet Models using Low-Degree Polynomial Approximations and Structural Optimizations on Leveled FHE

URL: http://arxiv.org/abs/2509.22857v1
Date: Fri, 26 Sep 2025 19:10:23 GMT
Title: PAPER: Privacy-Preserving ResNet Models using Low-Degree Polynomial Approximations and Structural Optimizations on Leveled FHE
Authors: Eduardo Chielle, Manaar Alam, Jinting Liu, Jovan Kascelan, Michail Maniatakos,
Abstract summary: Recent work has made non-interactive privacy-preserving inference more practical by running deep Convolution Neural Network (CNN) with Fully Homomorphic Encryption (FHE)<n>They also depend on high-degree approximations of non-linear activations, which increase multiplicative depth and reduce accuracy by 2-5% compared to plaintext ReLU models.<n>In this work, we focus on ResNets, a widely adopted benchmark architecture in privacy-preserving inference, and close the accuracy gap between their FHE non-interactive models and counterparts.
Score: 5.819818547073678
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work has made non-interactive privacy-preserving inference more practical by running deep Convolution Neural Network (CNN) with Fully Homomorphic Encryption (FHE). However, these methods remain limited by their reliance on bootstrapping, a costly FHE operation applied across multiple layers, severely slowing inference. They also depend on high-degree polynomial approximations of non-linear activations, which increase multiplicative depth and reduce accuracy by 2-5% compared to plaintext ReLU models. In this work, we focus on ResNets, a widely adopted benchmark architecture in privacy-preserving inference, and close the accuracy gap between their FHE-based non-interactive models and plaintext counterparts, while also achieving faster inference than existing methods. We use a quadratic polynomial approximation of ReLU, which achieves the theoretical minimum multiplicative depth for non-linear activations, along with a penalty-based training strategy. We further introduce structural optimizations such as node fusing, weight redistribution, and tower reuse. These optimizations reduce the required FHE levels in CNNs by nearly a factor of five compared to prior work, allowing us to run ResNet models under leveled FHE without bootstrapping. To further accelerate inference and recover accuracy typically lost with polynomial approximations, we introduce parameter clustering along with a joint strategy of data encoding layout and ensemble techniques. Experiments with ResNet-18, ResNet-20, and ResNet-32 on CIFAR-10 and CIFAR-100 show that our approach achieves up to 4x faster private inference than prior work with comparable accuracy to plaintext ReLU models.

Related papers

Peregrine: One-Shot Fine-Tuning for FHE Inference of General Deep CNNs [5.719717928243504]
We address two fundamental challenges in adapting general deep CNNs for FHE-based inference.<n>The first is approximating non-linear activations such as ReLU with low-degrees while minimizing accuracy degradation.<n>The second is overcoming the cipher capacity barrier that constrains high-resolution image processing on FHE inference.
arXiv Detail & Related papers (2025-11-24T10:47:39Z)
Don't Be Greedy, Just Relax! Pruning LLMs via Frank-Wolfe [61.68406997155879]
State-of-the-art Large Language Model (LLM) pruning methods operate layer-wise, minimizing the per-layer pruning error on a small dataset to avoid full retraining.<n>Existing methods hence rely on greedy convexs that ignore the weight interactions in the pruning objective.<n>Our method drastically reduces the per-layer pruning error, outperforms strong baselines on state-of-the-art GPT architectures, and remains memory-efficient.
arXiv Detail & Related papers (2025-10-15T16:13:44Z)
Deep Hierarchical Learning with Nested Subspace Networks [53.71337604556311]
We propose Nested Subspace Networks (NSNs) for large neural networks.<n>NSNs enable a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets.<n>We show that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier.
arXiv Detail & Related papers (2025-09-22T15:13:14Z)
Activate Me!: Designing Efficient Activation Functions for Privacy-Preserving Machine Learning with Fully Homomorphic Encryption [0.8901073744693314]
Homomorphic Encryption (FHE) enables computations directly on encrypted data.<n>FHE inherently supports only linear operations, making it difficult to implement non-linear activation functions.<n>This work focuses on designing, implementing, and evaluating activation functions tailored for FHE-based machine learning.
arXiv Detail & Related papers (2025-08-15T16:31:12Z)
FALCON: FLOP-Aware Combinatorial Optimization for Neural Network Pruning [17.60353530072587]
Network pruning offers a solution to reduce model size and computational cost while maintaining performance. Most current pruning methods focus primarily on improving sparsity by reducing the number of nonzero parameters. We propose FALCON, a novel-optimization-based framework for network pruning that jointly takes into account model accuracy (fidelity), FLOPs, and sparsity constraints.
arXiv Detail & Related papers (2024-03-11T18:40:47Z)
Regularized PolyKervNets: Optimizing Expressiveness and Efficiency for Private Inference in Deep Neural Networks [0.0]
We focus on PolyKervNets, a technique known for offering improved dynamic approximations in smaller networks. Our primary objective is to empirically explore optimization-based training recipes to enhance the performance of PolyKervNets in larger networks.
arXiv Detail & Related papers (2023-12-23T11:37:18Z)
Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption [17.010625600442584]
This study proposes an optimized layerwise approximation (OLA) framework for privacy-preserving deep neural networks. For efficient approximation, we reflect the layerwise accuracy by considering the actual input distribution of each activation function. As a result, the OLA method reduces inference times for the ResNet-20 model and the ResNet-32 model by 3.02 times and 2.82 times, respectively.
arXiv Detail & Related papers (2023-10-16T12:34:47Z)
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z)
Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs. We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z)
Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Efficient Pruning for Machine Learning Under Homomorphic Encryption [2.2817485071636376]
Privacy-preserving machine learning (PPML) solutions are gaining widespread popularity. Many rely on homomorphic encryption (HE) that offers confidentiality of the model and the data, but at the cost of large latency and memory requirements. We introduce a framework called HE-PEx that comprises new pruning methods, on top of a packing technique called tile tensors, for reducing the latency and memory of PPML inference.
arXiv Detail & Related papers (2022-07-07T15:49:24Z)
Kernel-Based Smoothness Analysis of Residual Networks [85.20737467304994]
Residual networks (ResNets) stand out among these powerful modern architectures. In this paper, we show another distinction between the two models, namely, a tendency of ResNets to promote smoothers than gradients.
arXiv Detail & Related papers (2020-09-21T16:32:04Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.