HE-PEx: Efficient Machine Learning under Homomorphic Encryption using
Pruning, Permutation and Expansion
- URL: http://arxiv.org/abs/2207.03384v1
- Date: Thu, 7 Jul 2022 15:49:24 GMT
- Title: HE-PEx: Efficient Machine Learning under Homomorphic Encryption using
Pruning, Permutation and Expansion
- Authors: Ehud Aharoni, Moran Baruch, Pradip Bose, Alper Buyuktosunoglu, Nir
Drucker, Subhankar Pal, Tomer Pelleg, Kanthi Sarpatwar, Hayim Shaul, Omri
Soceanu, Roman Vaculin
- Abstract summary: Homomorphic encryption (HE) is a method of performing computations over encrypted data.
We propose a novel set of pruning methods that reduce the latency and memory requirement, thus bringing the effectiveness of pruning methods to HE.
We demonstrate the advantage of our method on fully connected layers where the weights are packed using a recently proposed packing technique called tile tensors.
- Score: 4.209035833239216
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy-preserving neural network (NN) inference solutions have recently
gained significant traction with several solutions that provide different
latency-bandwidth trade-offs. Of these, many rely on homomorphic encryption
(HE), a method of performing computations over encrypted data. However, HE
operations even with state-of-the-art schemes are still considerably slow
compared to their plaintext counterparts. Pruning the parameters of a NN model
is a well-known approach to improving inference latency. However, pruning
methods that are useful in the plaintext context may lend nearly negligible
improvement in the HE case, as has also been demonstrated in recent work.
In this work, we propose a novel set of pruning methods that reduce the
latency and memory requirement, thus bringing the effectiveness of plaintext
pruning methods to HE. Crucially, our proposal employs two key techniques, viz.
permutation and expansion of the packed model weights, that enable pruning
significantly more ciphertexts and recuperating most of the accuracy loss,
respectively. We demonstrate the advantage of our method on fully connected
layers where the weights are packed using a recently proposed packing technique
called tile tensors, which allows executing deep NN inference in a
non-interactive mode. We evaluate our methods on various autoencoder
architectures and demonstrate that for a small mean-square reconstruction loss
of 1.5*10^{-5} on MNIST, we reduce the memory requirement and latency of
HE-enabled inference by 60%.
Related papers
- Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic
Programming [15.458305667190256]
We propose a novel depth compression algorithm which targets general convolution operations.
We achieve $1.41times$ speed-up with $0.11%p accuracy gain in MobileNetV2-1.0 on the ImageNet.
arXiv Detail & Related papers (2023-01-28T13:08:54Z) - CryptoGCN: Fast and Scalable Homomorphically Encrypted Graph
Convolutional Network Inference [12.03953896181613]
Cloud-based graph convolutional network (GCN) has demonstrated great success and potential in many privacy-sensitive applications.
Despite its high inference accuracy and performance on cloud, maintaining data privacy in GCN inference remains largely unexplored.
In this paper, we take an initial attempt towards this and develop $textitCryptoGCN$--a homomorphic encryption (HE) based GCN inference framework.
arXiv Detail & Related papers (2022-09-24T02:20:54Z) - Neural Networks Reduction via Lumping [0.0]
A large number of solutions has been published to reduce both the number of operations and the parameters involved with the models.
Most of these reducing techniques are actually methods and usually require at least one re-training step to recover the accuracy.
We propose a pruning approach that reduces the number of neurons in a network without using any data or fine-tuning, while completely preserving the exact behaviour.
arXiv Detail & Related papers (2022-09-15T17:13:07Z) - An Information Theory-inspired Strategy for Automatic Network Pruning [88.51235160841377]
Deep convolution neural networks are well known to be compressed on devices with resource constraints.
Most existing network pruning methods require laborious human efforts and prohibitive computation resources.
We propose an information theory-inspired strategy for automatic model compression.
arXiv Detail & Related papers (2021-08-19T07:03:22Z) - Circa: Stochastic ReLUs for Private Deep Learning [6.538025863698682]
We re-think the ReLU computation and propose optimizations for PI tailored to neural networks.
Specifically, we reformulate ReLU as an approximate sign test and introduce a novel truncation method for the sign test.
We demonstrate improvements of up to 4.7x storage and 3x runtime over baseline implementations.
arXiv Detail & Related papers (2021-06-15T22:52:45Z) - Improved Branch and Bound for Neural Network Verification via Lagrangian
Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks.
We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB)
BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z) - Successive Pruning for Model Compression via Rate Distortion Theory [15.598364403631528]
We study NN compression from an information-theoretic approach and show that rate distortion theory suggests pruning to achieve the theoretical limits of NN compression.
Our derivation also provides an end-to-end compression pipeline involving a novel pruning strategy.
Our method consistently outperforms the existing pruning strategies and reduces the pruned model's size by 2.5 times.
arXiv Detail & Related papers (2021-02-16T18:17:57Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z) - Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take.
We propose a novel approach based on Lagrangian Decomposition.
We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.