PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization
- URL: http://arxiv.org/abs/2410.09531v1
- Date: Sat, 12 Oct 2024 13:28:42 GMT
- Title: PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization
- Authors: Tianshi Xu, Shuzhang Zhong, Wenxuan Zeng, Runsheng Wang, Meng Li,
- Abstract summary: Existing secure 2PC frameworks suffer from a high inference latency due to enormous communication.
We propose PrivQuant, a framework that jointly optimize the 2PC-based quantized inference protocols and the network quantization algorithm.
We show PrivQuant reduces communication by $11times, 2.5times mathrmand 2.8times$, which results in $8.7times, 1.8times mathrmand 2.4times$ latency reduction compared with SiRNN, COINN, and CoPriv, respectively.
- Score: 2.9203160719029073
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Private deep neural network (DNN) inference based on secure two-party computation (2PC) enables secure privacy protection for both the server and the client. However, existing secure 2PC frameworks suffer from a high inference latency due to enormous communication. As the communication of both linear and non-linear DNN layers reduces with the bit widths of weight and activation, in this paper, we propose PrivQuant, a framework that jointly optimizes the 2PC-based quantized inference protocols and the network quantization algorithm, enabling communication-efficient private inference. PrivQuant proposes DNN architecture-aware optimizations for the 2PC protocols for communication-intensive quantized operators and conducts graph-level operator fusion for communication reduction. Moreover, PrivQuant also develops a communication-aware mixed precision quantization algorithm to improve inference efficiency while maintaining high accuracy. The network/protocol co-optimization enables PrivQuant to outperform prior-art 2PC frameworks. With extensive experiments, we demonstrate PrivQuant reduces communication by $11\times, 2.5\times \mathrm{and}~ 2.8\times$, which results in $8.7\times, 1.8\times ~ \mathrm{and}~ 2.4\times$ latency reduction compared with SiRNN, COINN, and CoPriv, respectively.
Related papers
- PrivCirNet: Efficient Private Inference via Block Circulant Transformation [11.859511840002916]
Homomorphic encryption (HE)-based deep neural network (DNN) inference protects data and model privacy but suffers from significant computation overhead.
We propose PrivCirNet, a protocol/network co-optimization framework based on block circulant transformation.
PrivCirNet customizes the HE encoding algorithm that is fully compatible with the block circulant transformation.
arXiv Detail & Related papers (2024-05-23T13:44:48Z) - EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization [3.1330492824737055]
Private convolutional neural network (CNN) inference based on secure two-party computation (2PC) suffers from high communication and latency overhead.
We propose EQO, a quantized 2PC inference framework that jointly optimize the CNNs and 2PC protocols.
With extensive experiments, EQO demonstrates 11.7x, 3.6x, and 6.3x communication reduction with 1.29%, 1.16%, and 1.29% higher accuracy compared to state-of-the-art frameworks SiRNN, COINN, and CoPriv, respectively.
arXiv Detail & Related papers (2024-04-15T01:41:18Z) - HEQuant: Marrying Homomorphic Encryption and Quantization for
Communication-Efficient Private Inference [2.498379184732383]
We propose HEQuant, which features low-precision-quantization-aware optimization for the HE-based protocols.
Compared with prior-art HE-based protocols, e.g., CrypTFlow2, Cheetah, Iron, etc, HEQuant achieves $3.5sim 23.4times$ communication reduction.
arXiv Detail & Related papers (2024-01-29T08:59:05Z) - CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference [13.039573608167077]
Deep neural network (DNN) inference based on secure 2-party (2PC) can offer cryptographically-secure privacy protection.
Previous works heavily rely on a proxy metric of ReLU counts to approximate the communication overhead.
We present CoPriv, a framework that jointly optimize the 2PC inference protocol and the DNN architecture.
arXiv Detail & Related papers (2023-11-03T06:19:48Z) - RRNet: Towards ReLU-Reduced Neural Network for Two-party Computation
Based Private Inference [17.299835585861747]
We introduce RRNet, a framework that aims to jointly reduce the overhead of MPC comparison protocols and accelerate computation through hardware acceleration.
Our approach integrates the hardware latency of cryptographic building blocks into the DNN loss function, resulting in improved energy efficiency, accuracy, and security guarantees.
arXiv Detail & Related papers (2023-02-05T04:02:13Z) - OMPQ: Orthogonal Mixed Precision Quantization [64.59700856607017]
Mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization.
We propose to optimize a proxy metric, the concept of networkity, which is highly correlated with the loss of the integer programming.
This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy.
arXiv Detail & Related papers (2021-09-16T10:59:33Z) - Entanglement Rate Optimization in Heterogeneous Quantum Communication
Networks [79.8886946157912]
Quantum communication networks are emerging as a promising technology that could constitute a key building block in future communication networks in the 6G era and beyond.
Recent advances led to the deployment of small- and large-scale quantum communication networks with real quantum hardware.
In quantum networks, entanglement is a key resource that allows for data transmission between different nodes.
arXiv Detail & Related papers (2021-05-30T11:34:23Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z) - D2P-Fed: Differentially Private Federated Learning With Efficient
Communication [78.57321932088182]
We propose a unified scheme to achieve both differential privacy (DP) and communication efficiency in federated learning (FL)
In particular, compared with the only prior work taking care of both aspects, D2P-Fed provides stronger privacy guarantee, better composability and smaller communication cost.
The results show that D2P-Fed outperforms the-state-of-the-art by 4.7% to 13.0% in terms of model accuracy while saving one third of the communication cost.
arXiv Detail & Related papers (2020-06-22T06:46:11Z) - A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration
Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.
Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data.
We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z) - Experimental quantum conference key agreement [55.41644538483948]
Quantum networks will provide multi-node entanglement over long distances to enable secure communication on a global scale.
Here we demonstrate quantum conference key agreement, a quantum communication protocol that exploits multi-partite entanglement.
We distribute four-photon Greenberger-Horne-Zeilinger (GHZ) states generated by high-brightness, telecom photon-pair sources across up to 50 km of fibre.
arXiv Detail & Related papers (2020-02-04T19:00:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.