Circa: Stochastic ReLUs for Private Deep Learning
- URL: http://arxiv.org/abs/2106.08475v1
- Date: Tue, 15 Jun 2021 22:52:45 GMT
- Title: Circa: Stochastic ReLUs for Private Deep Learning
- Authors: Zahra Ghodsi, Nandan Kumar Jha, Brandon Reagen, Siddharth Garg
- Abstract summary: We re-think the ReLU computation and propose optimizations for PI tailored to neural networks.
Specifically, we reformulate ReLU as an approximate sign test and introduce a novel truncation method for the sign test.
We demonstrate improvements of up to 4.7x storage and 3x runtime over baseline implementations.
- Score: 6.538025863698682
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The simultaneous rise of machine learning as a service and concerns over user
privacy have increasingly motivated the need for private inference (PI). While
recent work demonstrates PI is possible using cryptographic primitives, the
computational overheads render it impractical. The community is largely
unprepared to address these overheads, as the source of slowdown in PI stems
from the ReLU operator whereas optimizations for plaintext inference focus on
optimizing FLOPs. In this paper we re-think the ReLU computation and propose
optimizations for PI tailored to properties of neural networks. Specifically,
we reformulate ReLU as an approximate sign test and introduce a novel
truncation method for the sign test that significantly reduces the cost per
ReLU. These optimizations result in a specific type of stochastic ReLU. The key
observation is that the stochastic fault behavior is well suited for the
fault-tolerant properties of neural network inference. Thus, we provide
significant savings without impacting accuracy. We collectively call the
optimizations Circa and demonstrate improvements of up to 4.7x storage and 3x
runtime over baseline implementations; we further show that Circa can be used
on top of recent PI optimizations to obtain 1.8x additional speedup.
Related papers
- Exact, Tractable Gauss-Newton Optimization in Deep Reversible Architectures Reveal Poor Generalization [52.16435732772263]
Second-order optimization has been shown to accelerate the training of deep neural networks in many applications.
However, generalization properties of second-order methods are still being debated.
We show for the first time that exact Gauss-Newton (GN) updates take on a tractable form in a class of deep architectures.
arXiv Detail & Related papers (2024-11-12T17:58:40Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Understanding and Improving Optimization in Predictive Coding Networks [1.6114012813668934]
inference learning algorithm (IL) is a promising, bio-plausible alternative to Backpropagation (BP)
IL is computationally demanding, and without memory-intensives like Adam, IL may converge to poor local minima.
IL can reduce loss more quickly than BP, but the reasons for these speedups or their robustness remains unclear.
arXiv Detail & Related papers (2023-05-23T00:32:26Z) - DeepReShape: Redesigning Neural Networks for Efficient Private Inference [3.7802450241986945]
Recent work has shown that FLOPs for PI can no longer be ignored and incur high latency penalties.
We develop DeepReShape, a technique that optimize neural network architectures under PI's constraints.
arXiv Detail & Related papers (2023-04-20T18:27:02Z) - Learning to Linearize Deep Neural Networks for Secure and Efficient
Private Inference [5.293553970082942]
Existing techniques to reduce ReLU operations often involve manual effort and sacrifice accuracy.
We first present a novel measure of non-linearity layers' ReLU sensitivity, enabling mitigation of the time-consuming manual efforts.
We then present SENet, a three-stage training method that automatically assigns per-layer ReLU counts, decides the ReLU locations for each layer's activation map, and trains a model with significantly fewer ReLUs.
arXiv Detail & Related papers (2023-01-23T03:33:38Z) - Selective Network Linearization for Efficient Private Inference [49.937470642033155]
We propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy.
The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2times$ less latency (iso-accuracy at 70%) than the current state of the art.
arXiv Detail & Related papers (2022-02-04T19:00:24Z) - DeepReDuce: ReLU Reduction for Fast Private Inference [6.538025863698682]
Recent rise of privacy concerns has led researchers to devise methods for private neural inference.
computing on encrypted data levies an impractically-high latency penalty.
This paper proposes DeepReDuce: a set of optimizations for the judicious removal of ReLUs to reduce private inference latency.
arXiv Detail & Related papers (2021-03-02T01:16:53Z) - Activation Relaxation: A Local Dynamical Approximation to
Backpropagation in the Brain [62.997667081978825]
Activation Relaxation (AR) is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system.
Our algorithm converges rapidly and robustly to the correct backpropagation gradients, requires only a single type of computational unit, and can operate on arbitrary computation graphs.
arXiv Detail & Related papers (2020-09-11T11:56:34Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.