Selective Network Linearization for Efficient Private Inference
- URL: http://arxiv.org/abs/2202.02340v1
- Date: Fri, 4 Feb 2022 19:00:24 GMT
- Title: Selective Network Linearization for Efficient Private Inference
- Authors: Minsu Cho, Ameya Joshi, Siddharth Garg, Brandon Reagen, Chinmay Hegde
- Abstract summary: We propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy.
The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2times$ less latency (iso-accuracy at 70%) than the current state of the art.
- Score: 49.937470642033155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Private inference (PI) enables inference directly on cryptographically secure
data. While promising to address many privacy issues, it has seen limited use
due to extreme runtimes. Unlike plaintext inference, where latency is dominated
by FLOPs, in PI non-linear functions (namely ReLU) are the bottleneck. Thus,
practical PI demands novel ReLU-aware optimizations. To reduce PI latency we
propose a gradient-based algorithm that selectively linearizes ReLUs while
maintaining prediction accuracy. We evaluate our algorithm on several standard
PI benchmarks. The results demonstrate up to $4.25\%$ more accuracy (iso-ReLU
count at 50K) or $2.2\times$ less latency (iso-accuracy at 70\%) than the
current state of the art and advance the Pareto frontier across the
latency-accuracy space. To complement empirical results, we present a "no free
lunch" theorem that sheds light on how and when network linearization is
possible while maintaining prediction accuracy.
Related papers
- xMLP: Revolutionizing Private Inference with Exclusive Square Activation [27.092753578066294]
Private Inference (PI) enables deep neural networks (DNNs) to work on private data without leaking sensitive information.
The use of non-linear activations such as ReLU in DNNs can lead to impractically high PI latency.
We propose xMLP, a novel DNN architecture that uses square activations exclusively while maintaining parity in both accuracy and efficiency.
arXiv Detail & Related papers (2024-03-12T18:46:56Z) - DeepReShape: Redesigning Neural Networks for Efficient Private Inference [3.7802450241986945]
Recent work has shown that FLOPs for PI can no longer be ignored and incur high latency penalties.
We develop DeepReShape, a technique that optimize neural network architectures under PI's constraints.
arXiv Detail & Related papers (2023-04-20T18:27:02Z) - Reliable Prediction Intervals with Directly Optimized Inductive
Conformal Regression for Deep Learning [3.42658286826597]
Predictions intervals (PIs) are used to quantify the uncertainty of each prediction in deep learning regression.
Many approaches to improve the quality of PIs can effectively reduce the width of PIs, but they do not ensure that enough real labels are captured.
In this study, we use Directly Optimized Inductive Conformal Regression (DOICR) that takes only the average width of PIs as the loss function.
Benchmark experiments show that DOICR outperforms current state-of-the-art algorithms for regression problems.
arXiv Detail & Related papers (2023-02-02T04:46:14Z) - Efficient Graph Neural Network Inference at Large Scale [54.89457550773165]
Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications.
Existing scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure.
We propose a novel adaptive propagation order approach that generates the personalized propagation order for each node based on its topological information.
arXiv Detail & Related papers (2022-11-01T14:38:18Z) - Circa: Stochastic ReLUs for Private Deep Learning [6.538025863698682]
We re-think the ReLU computation and propose optimizations for PI tailored to neural networks.
Specifically, we reformulate ReLU as an approximate sign test and introduce a novel truncation method for the sign test.
We demonstrate improvements of up to 4.7x storage and 3x runtime over baseline implementations.
arXiv Detail & Related papers (2021-06-15T22:52:45Z) - DeepReDuce: ReLU Reduction for Fast Private Inference [6.538025863698682]
Recent rise of privacy concerns has led researchers to devise methods for private neural inference.
computing on encrypted data levies an impractically-high latency penalty.
This paper proposes DeepReDuce: a set of optimizations for the judicious removal of ReLUs to reduce private inference latency.
arXiv Detail & Related papers (2021-03-02T01:16:53Z) - On the Practicality of Differential Privacy in Federated Learning by
Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively.
Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks.
Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z) - StrObe: Streaming Object Detection from LiDAR Packets [73.27333924964306]
Rolling shutter LiDARs emitted as a stream of packets, each covering a sector of the 360deg coverage.
Modern perception algorithms wait for the full sweep to be built before processing the data, which introduces an additional latency.
In this paper we propose StrObe, a novel approach that minimizes latency by ingesting LiDAR packets and emitting a stream of detections without waiting for the full sweep to be built.
arXiv Detail & Related papers (2020-11-12T14:57:44Z) - Privacy Preserving Recalibration under Domain Shift [119.21243107946555]
We introduce a framework that abstracts out the properties of recalibration problems under differential privacy constraints.
We also design a novel recalibration algorithm, accuracy temperature scaling, that outperforms prior work on private datasets.
arXiv Detail & Related papers (2020-08-21T18:43:37Z) - Towards Streaming Perception [70.68520310095155]
We present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception.
The key insight behind this metric is to jointly evaluate the output of the entire perception stack at every time instant.
We focus on the illustrative tasks of object detection and instance segmentation in urban video streams, and contribute a novel dataset with high-quality and temporally-dense annotations.
arXiv Detail & Related papers (2020-05-21T01:51:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.