Related papers: HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference

HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference

URL: http://arxiv.org/abs/2401.15970v2
Date: Wed, 31 Jan 2024 02:11:46 GMT
Title: HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference
Authors: Tianshi Xu, Meng Li, Runsheng Wang
Abstract summary: We propose HEQuant, which features low-precision-quantization-aware optimization for the HE-based protocols. Compared with prior-art HE-based protocols, e.g., CrypTFlow2, Cheetah, Iron, etc, HEQuant achieves $3.5sim 23.4times$ communication reduction.
Score: 2.498379184732383
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Secure two-party computation with homomorphic encryption (HE) protects data privacy with a formal security guarantee but suffers from high communication overhead. While previous works, e.g., Cheetah, Iron, etc, have proposed efficient HE-based protocols for different neural network (NN) operations, they still assume high precision, e.g., fixed point 37 bit, for the NN operations and ignore NNs' native robustness against quantization error. In this paper, we propose HEQuant, which features low-precision-quantization-aware optimization for the HE-based protocols. We observe the benefit of a naive combination of quantization and HE quickly saturates as bit precision goes down. Hence, to further improve communication efficiency, we propose a series of optimizations, including an intra-coefficient packing algorithm and a quantization-aware tiling algorithm, to simultaneously reduce the number and precision of the transferred data. Compared with prior-art HE-based protocols, e.g., CrypTFlow2, Cheetah, Iron, etc, HEQuant achieves $3.5\sim 23.4\times$ communication reduction and $3.0\sim 9.3\times$ latency reduction. Meanwhile, when compared with prior-art network optimization frameworks, e.g., SENet, SNL, etc, HEQuant also achieves $3.1\sim 3.6\times$ communication reduction.

Related papers

Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics [65.37942405146232]
We present a novel type of overload that carries with extremely lightweight state elements, achieved through ultra-low-precision quantization. The proposed SOLO achieves substantial memory savings (approximately 45 GB when training a 7B model) with minimal accuracy loss.
arXiv Detail & Related papers (2025-05-01T06:47:45Z)
PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization [2.9203160719029073]
Existing secure 2PC frameworks suffer from a high inference latency due to enormous communication. We propose PrivQuant, a framework that jointly optimize the 2PC-based quantized inference protocols and the network quantization algorithm. We show PrivQuant reduces communication by $11times, 2.5times mathrmand 2.8times$, which results in $8.7times, 1.8times mathrmand 2.4times$ latency reduction compared with SiRNN, COINN, and CoPriv, respectively.
arXiv Detail & Related papers (2024-10-12T13:28:42Z)
Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks [10.229120811024162]
deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization. We propose a novel methodology to apply them jointly via a lightweight gradient-based search.
arXiv Detail & Related papers (2024-07-01T08:07:02Z)
EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization [3.1330492824737055]
Private convolutional neural network (CNN) inference based on secure two-party computation (2PC) suffers from high communication and latency overhead. We propose EQO, a quantized 2PC inference framework that jointly optimize the CNNs and 2PC protocols. With extensive experiments, EQO demonstrates 11.7x, 3.6x, and 6.3x communication reduction with 1.29%, 1.16%, and 1.29% higher accuracy compared to state-of-the-art frameworks SiRNN, COINN, and CoPriv, respectively.
arXiv Detail & Related papers (2024-04-15T01:41:18Z)
CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference [13.039573608167077]
Deep neural network (DNN) inference based on secure 2-party (2PC) can offer cryptographically-secure privacy protection. Previous works heavily rely on a proxy metric of ReLU counts to approximate the communication overhead. We present CoPriv, a framework that jointly optimize the 2PC inference protocol and the DNN architecture.
arXiv Detail & Related papers (2023-11-03T06:19:48Z)
SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation [22.782678826199206]
Quantization of deep neural networks (DNN) has been proven effective for compressing and accelerating models. Data-free quantization (DFQ) is a promising approach without the original datasets under privacy-sensitive and confidential scenarios. This paper proposes an on-the-fly DFQ framework with sub-second quantization time, called SQuant, which can quantize networks on inference-only devices.
arXiv Detail & Related papers (2022-02-14T01:57:33Z)
OMPQ: Orthogonal Mixed Precision Quantization [64.59700856607017]
Mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization. We propose to optimize a proxy metric, the concept of networkity, which is highly correlated with the loss of the integer programming. This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy.
arXiv Detail & Related papers (2021-09-16T10:59:33Z)
HANT: Hardware-Aware Network Transformation [82.54824188745887]
We propose hardware-aware network transformation (HANT) HANT replaces inefficient operations with more efficient alternatives using a neural architecture search like approach. Our results on accelerating the EfficientNet family show that HANT can accelerate them by up to 3.6x with 0.4% drop in the top-1 accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-07-12T18:46:34Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
HAWQV3: Dyadic Neural Network Quantization [73.11579145354801]
Current low-precision quantization algorithms often have the hidden cost of conversion back and forth from floating point to quantized integer values. We present HAWQV3, a novel mixed-precision integer-only quantization framework.
arXiv Detail & Related papers (2020-11-20T23:51:43Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy [49.3037538647714]
We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ.
arXiv Detail & Related papers (2020-06-15T16:09:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.