Related papers: DeepReDuce: ReLU Reduction for Fast Private Inference

DeepReDuce: ReLU Reduction for Fast Private Inference

URL: http://arxiv.org/abs/2103.01396v1
Date: Tue, 2 Mar 2021 01:16:53 GMT
Title: DeepReDuce: ReLU Reduction for Fast Private Inference
Authors: Nandan Kumar Jha, Zahra Ghodsi, Siddharth Garg, Brandon Reagen
Abstract summary: Recent rise of privacy concerns has led researchers to devise methods for private neural inference. computing on encrypted data levies an impractically-high latency penalty. This paper proposes DeepReDuce: a set of optimizations for the judicious removal of ReLUs to reduce private inference latency.
Score: 6.538025863698682
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent rise of privacy concerns has led researchers to devise methods for private neural inference -- where inferences are made directly on encrypted data, never seeing inputs. The primary challenge facing private inference is that computing on encrypted data levies an impractically-high latency penalty, stemming mostly from non-linear operators like ReLU. Enabling practical and private inference requires new optimization methods that minimize network ReLU counts while preserving accuracy. This paper proposes DeepReDuce: a set of optimizations for the judicious removal of ReLUs to reduce private inference latency. The key insight is that not all ReLUs contribute equally to accuracy. We leverage this insight to drop, or remove, ReLUs from classic networks to significantly reduce inference latency and maintain high accuracy. Given a target network, DeepReDuce outputs a Pareto frontier of networks that tradeoff the number of ReLUs and accuracy. Compared to the state-of-the-art for private inference DeepReDuce improves accuracy and reduces ReLU count by up to 3.5% (iso-ReLU count) and 3.5$\times$ (iso-accuracy), respectively.

Related papers

Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z)
DeepReShape: Redesigning Neural Networks for Efficient Private Inference [3.7802450241986945]
Recent work has shown that FLOPs for PI can no longer be ignored and incur high latency penalties. We develop DeepReShape, a technique that optimize neural network architectures under PI's constraints.
arXiv Detail & Related papers (2023-04-20T18:27:02Z)
Learning to Linearize Deep Neural Networks for Secure and Efficient Private Inference [5.293553970082942]
Existing techniques to reduce ReLU operations often involve manual effort and sacrifice accuracy. We first present a novel measure of non-linearity layers' ReLU sensitivity, enabling mitigation of the time-consuming manual efforts. We then present SENet, a three-stage training method that automatically assigns per-layer ReLU counts, decides the ReLU locations for each layer's activation map, and trains a model with significantly fewer ReLUs.
arXiv Detail & Related papers (2023-01-23T03:33:38Z)
On Differential Privacy for Federated Learning in Wireless Systems with Multiple Base Stations [90.53293906751747]
We consider a federated learning model in a wireless system with multiple base stations and inter-cell interference. We show the convergence behavior of the learning process by deriving an upper bound on its optimality gap. Our proposed scheduler improves the average accuracy of the predictions compared with a random scheduler.
arXiv Detail & Related papers (2022-08-25T03:37:11Z)
Selective Network Linearization for Efficient Private Inference [49.937470642033155]
We propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2times$ less latency (iso-accuracy at 70%) than the current state of the art.
arXiv Detail & Related papers (2022-02-04T19:00:24Z)
Sphynx: ReLU-Efficient Network Design for Private Inference [49.73927340643812]
We focus on private inference (PI), where the goal is to perform inference on a user's data sample using a service provider's model. Existing PI methods for deep networks enable cryptographically secure inference with little drop in functionality. This paper presents Sphynx, a ReLU-efficient network design method based on micro-search strategies for convolutional cell design.
arXiv Detail & Related papers (2021-06-17T18:11:10Z)
Circa: Stochastic ReLUs for Private Deep Learning [6.538025863698682]
We re-think the ReLU computation and propose optimizations for PI tailored to neural networks. Specifically, we reformulate ReLU as an approximate sign test and introduce a novel truncation method for the sign test. We demonstrate improvements of up to 4.7x storage and 3x runtime over baseline implementations.
arXiv Detail & Related papers (2021-06-15T22:52:45Z)
ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting [105.97936163854693]
We propose ResRep, which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re- parameterize a CNN into the remembering parts and forgetting parts. We equivalently merge the remembering and forgetting parts into the original architecture with narrower layers.
arXiv Detail & Related papers (2020-07-07T07:56:45Z)
Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
CryptoNAS: Private Inference on a ReLU Budget [8.8438567779565]
Existing models are ill-suited for private inference (PI): methods to process inferences without disclosing inputs. This paper makes the observation that existing models are ill-suited for PI and proposes a novel NAS method, named CryptoNAS, for finding and tailoring models to the needs of PI. We develop the idea of a ReLU budget as a proxy for inference latency and use CryptoNAS to build models that maximize accuracy within a given budget.
arXiv Detail & Related papers (2020-06-15T20:06:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.