HE-PEx: Efficient Machine Learning under Homomorphic Encryption using
Pruning, Permutation and Expansion
- URL: http://arxiv.org/abs/2207.03384v1
- Date: Thu, 7 Jul 2022 15:49:24 GMT
- Title: HE-PEx: Efficient Machine Learning under Homomorphic Encryption using
Pruning, Permutation and Expansion
- Authors: Ehud Aharoni, Moran Baruch, Pradip Bose, Alper Buyuktosunoglu, Nir
Drucker, Subhankar Pal, Tomer Pelleg, Kanthi Sarpatwar, Hayim Shaul, Omri
Soceanu, Roman Vaculin
- Abstract summary: Homomorphic encryption (HE) is a method of performing computations over encrypted data.
We propose a novel set of pruning methods that reduce the latency and memory requirement, thus bringing the effectiveness of pruning methods to HE.
We demonstrate the advantage of our method on fully connected layers where the weights are packed using a recently proposed packing technique called tile tensors.
- Score: 4.209035833239216
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy-preserving neural network (NN) inference solutions have recently
gained significant traction with several solutions that provide different
latency-bandwidth trade-offs. Of these, many rely on homomorphic encryption
(HE), a method of performing computations over encrypted data. However, HE
operations even with state-of-the-art schemes are still considerably slow
compared to their plaintext counterparts. Pruning the parameters of a NN model
is a well-known approach to improving inference latency. However, pruning
methods that are useful in the plaintext context may lend nearly negligible
improvement in the HE case, as has also been demonstrated in recent work.
In this work, we propose a novel set of pruning methods that reduce the
latency and memory requirement, thus bringing the effectiveness of plaintext
pruning methods to HE. Crucially, our proposal employs two key techniques, viz.
permutation and expansion of the packed model weights, that enable pruning
significantly more ciphertexts and recuperating most of the accuracy loss,
respectively. We demonstrate the advantage of our method on fully connected
layers where the weights are packed using a recently proposed packing technique
called tile tensors, which allows executing deep NN inference in a
non-interactive mode. We evaluate our methods on various autoencoder
architectures and demonstrate that for a small mean-square reconstruction loss
of 1.5*10^{-5} on MNIST, we reduce the memory requirement and latency of
HE-enabled inference by 60%.
Related papers
- Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z) - MOFHEI: Model Optimizing Framework for Fast and Efficient Homomorphically Encrypted Neural Network Inference [0.8388591755871735]
Homomorphic Encryption (HE) enables us to perform machine learning tasks over encrypted data.
We propose MOFHEI, a framework that optimize the model to make HE-based neural network inference, fast and efficient.
Our framework achieves up to 98% pruning ratio on LeNet, eliminating up to 93% of the required HE operations for performing PI.
arXiv Detail & Related papers (2024-12-10T22:44:54Z) - DNN Memory Footprint Reduction via Post-Training Intra-Layer Multi-Precision Quantization [0.0]
This paper introduces a technique that effectively reduces the memory footprint of Deep Neural Network (DNN) models on resource-constrained edge devices.
Our proposed technique, named Post-Training Intra-Layer Multi-Precision Quantization (PTILMPQ), employs a post-training quantization approach, eliminating the need for extensive training data.
arXiv Detail & Related papers (2024-04-03T15:06:09Z) - A Masked Pruning Approach for Dimensionality Reduction in
Communication-Efficient Federated Learning Systems [11.639503711252663]
Federated Learning (FL) represents a growing machine learning (ML) paradigm designed for training models across numerous nodes.
We develop a novel algorithm that overcomes limitations by combining a pruning-based method with the FL process.
We present an extensive experimental study demonstrating the superior performance of MPFL compared to existing methods.
arXiv Detail & Related papers (2023-12-06T20:29:23Z) - EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality.
On the software side, we evaluate epitomes' latency and energy on PIM accelerators.
We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Lightweight and Progressively-Scalable Networks for Semantic
Segmentation [100.63114424262234]
Multi-scale learning frameworks have been regarded as a capable class of models to boost semantic segmentation.
In this paper, we thoroughly analyze the design of convolutional blocks and the ways of interactions across multiple scales.
We devise Lightweight and Progressively-Scalable Networks (LPS-Net) that novelly expands the network complexity in a greedy manner.
arXiv Detail & Related papers (2022-07-27T16:00:28Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated
Edge Inference [1.7894377200944507]
Machine learning networks can easily exceed available memory, increasing latency due to excessive OS swapping.
We propose a memory usage predictor coupled with a search algorithm to provide optimized fusing and tiling configurations.
Results show that our approach can run in less than half the memory, and with a speedup of up to 2.78 under severe memory constraints.
arXiv Detail & Related papers (2021-07-14T19:45:49Z) - Layer Pruning via Fusible Residual Convolutional Block for Deep Neural
Networks [15.64167076052513]
layer pruning has less inference time and runtime memory usage when the same FLOPs and number of parameters are pruned.
We propose a simple layer pruning method using residual convolutional block (ResConv)
Our pruning method achieves excellent performance of compression and acceleration over the state-thearts on different datasets.
arXiv Detail & Related papers (2020-11-29T12:51:16Z) - MicroNet: Towards Image Recognition with Extremely Low FLOPs [117.96848315180407]
MicroNet is an efficient convolutional neural network using extremely low computational cost.
A family of MicroNets achieve a significant performance gain over the state-of-the-art in the low FLOP regime.
For instance, MicroNet-M1 achieves 61.1% top-1 accuracy on ImageNet classification with 12 MFLOPs, outperforming MobileNetV3 by 11.3%.
arXiv Detail & Related papers (2020-11-24T18:59:39Z) - A Variational Information Bottleneck Based Method to Compress Sequential
Networks for Human Action Recognition [9.414818018857316]
We propose a method to effectively compress Recurrent Neural Networks (RNNs) used for Human Action Recognition (HAR)
We use a Variational Information Bottleneck (VIB) theory-based pruning approach to limit the information flow through the sequential cells of RNNs to a small subset.
We combine our pruning method with a specific group-lasso regularization technique that significantly improves compression.
It is shown that our method achieves over 70 times greater compression than the nearest competitor with comparable accuracy for the task of action recognition on UCF11.
arXiv Detail & Related papers (2020-10-03T12:41:51Z) - ALF: Autoencoder-based Low-rank Filter-sharing for Efficient
Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF)
ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.