xMLP: Revolutionizing Private Inference with Exclusive Square Activation
- URL: http://arxiv.org/abs/2403.08024v1
- Date: Tue, 12 Mar 2024 18:46:56 GMT
- Title: xMLP: Revolutionizing Private Inference with Exclusive Square Activation
- Authors: Jiajie Li, Jinjun Xiong
- Abstract summary: Private Inference (PI) enables deep neural networks (DNNs) to work on private data without leaking sensitive information.
The use of non-linear activations such as ReLU in DNNs can lead to impractically high PI latency.
We propose xMLP, a novel DNN architecture that uses square activations exclusively while maintaining parity in both accuracy and efficiency.
- Score: 27.092753578066294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Private Inference (PI) enables deep neural networks (DNNs) to work on private
data without leaking sensitive information by exploiting cryptographic
primitives such as multi-party computation (MPC) and homomorphic encryption
(HE). However, the use of non-linear activations such as ReLU in DNNs can lead
to impractically high PI latency in existing PI systems, as ReLU requires the
use of costly MPC computations, such as Garbled Circuits. Since square
activations can be processed by Beaver's triples hundreds of times faster
compared to ReLU, they are more friendly to PI tasks, but using them leads to a
notable drop in model accuracy. This paper starts by exploring the reason for
such an accuracy drop after using square activations, and concludes that this
is due to an "information compounding" effect. Leveraging this insight, we
propose xMLP, a novel DNN architecture that uses square activations exclusively
while maintaining parity in both accuracy and efficiency with ReLU-based DNNs.
Our experiments on CIFAR-100 and ImageNet show that xMLP models consistently
achieve better performance than ResNet models with fewer activation layers and
parameters while maintaining consistent performance with its ReLU-based
variants. Remarkably, when compared to state-of-the-art PI Models, xMLP
demonstrates superior performance, achieving a 0.58% increase in accuracy with
7x faster PI speed. Moreover, it delivers a significant accuracy improvement of
4.96% while maintaining the same PI latency. When offloading PI to the GPU,
xMLP is up to 700x faster than the previous state-of-the-art PI model with
comparable accuracy.
Related papers
- Efficient Privacy-Preserving Convolutional Spiking Neural Networks with
FHE [1.437446768735628]
Homomorphic Encryption (FHE) is a key technology for privacy-preserving computation.
FHE has limitations in processing continuous non-polynomial functions.
We present a framework called FHE-DiCSNN for homomorphic SNNs.
FHE-DiCSNN achieves an accuracy of 97.94% on ciphertexts, with a loss of only 0.53% compared to the original network's accuracy of 98.47%.
arXiv Detail & Related papers (2023-09-16T15:37:18Z) - Reliable Prediction Intervals with Directly Optimized Inductive
Conformal Regression for Deep Learning [3.42658286826597]
Predictions intervals (PIs) are used to quantify the uncertainty of each prediction in deep learning regression.
Many approaches to improve the quality of PIs can effectively reduce the width of PIs, but they do not ensure that enough real labels are captured.
In this study, we use Directly Optimized Inductive Conformal Regression (DOICR) that takes only the average width of PIs as the loss function.
Benchmark experiments show that DOICR outperforms current state-of-the-art algorithms for regression problems.
arXiv Detail & Related papers (2023-02-02T04:46:14Z) - Selective Network Linearization for Efficient Private Inference [49.937470642033155]
We propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy.
The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2times$ less latency (iso-accuracy at 70%) than the current state of the art.
arXiv Detail & Related papers (2022-02-04T19:00:24Z) - AESPA: Accuracy Preserving Low-degree Polynomial Activation for Fast
Private Inference [1.4878320574640147]
We propose an accuracy preserving low-degree activation function (AESPA) that exploits the Hermite expansion of the ReLU and basis-wise normalization.
When applied to the all-RELU baseline on the state-of-the-art Delphi PI protocol, AESPA shows up to 42.1x and 28.3x lower online latency and communication cost.
arXiv Detail & Related papers (2022-01-18T02:02:02Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - FasterPose: A Faster Simple Baseline for Human Pose Estimation [65.8413964785972]
We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose.
We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence.
Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
arXiv Detail & Related papers (2021-07-07T13:39:08Z) - Circa: Stochastic ReLUs for Private Deep Learning [6.538025863698682]
We re-think the ReLU computation and propose optimizations for PI tailored to neural networks.
Specifically, we reformulate ReLU as an approximate sign test and introduce a novel truncation method for the sign test.
We demonstrate improvements of up to 4.7x storage and 3x runtime over baseline implementations.
arXiv Detail & Related papers (2021-06-15T22:52:45Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design.
Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars.
EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z) - HiPPO: Recurrent Memory with Optimal Polynomial Projections [93.3537706398653]
We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto bases.
Given a measure that specifies the importance of each time step in the past, HiPPO produces an optimal solution to a natural online function approximation problem.
This formal framework yields a new memory update mechanism (HiPPO-LegS) that scales through time to remember all history, avoiding priors on the timescale.
arXiv Detail & Related papers (2020-08-17T23:39:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.