Related papers: DeepReShape: Redesigning Neural Networks for Efficient Private Inference

DeepReShape: Redesigning Neural Networks for Efficient Private Inference

URL: http://arxiv.org/abs/2304.10593v4
Date: Mon, 24 Jun 2024 15:34:40 GMT
Title: DeepReShape: Redesigning Neural Networks for Efficient Private Inference
Authors: Nandan Kumar Jha, Brandon Reagen,
Abstract summary: Recent work has shown that FLOPs for PI can no longer be ignored and incur high latency penalties. We develop DeepReShape, a technique that optimize neural network architectures under PI's constraints.
Score: 3.7802450241986945
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prior work on Private Inference (PI) -- inferences performed directly on encrypted input -- has focused on minimizing a network's ReLUs, which have been assumed to dominate PI latency rather than FLOPs. Recent work has shown that FLOPs for PI can no longer be ignored and incur high latency penalties. In this paper, we develop DeepReShape, a technique that optimizes neural network architectures under PI's constraints, optimizing for both ReLUs and FLOPs for the first time. The key insight is strategically allocating channels to position the network's ReLUs in order of their criticality to network accuracy, simultaneously optimizes ReLU and FLOPs efficiency. DeepReShape automates network development with an efficient process, and we call generated networks HybReNets. We evaluate DeepReShape using standard PI benchmarks and demonstrate a 2.1% accuracy gain with a 5.2$\times$ runtime improvement at iso-ReLU on CIFAR-100 and an 8.7$\times$ runtime improvement at iso-accuracy on TinyImageNet. Furthermore, we investigate the significance of network selection in prior ReLU optimizations and shed light on the key network attributes for superior PI performance.

Related papers

Preserving Deep Representations In One-Shot Pruning: A Hessian-Free Second-Order Optimization Framework [12.331056472174275]
We present SNOWS, a one-shot post-training pruning framework aimed at reducing the cost of vision network inference without retraining. A key innovation of our framework is the use of Hessian-free optimization to compute exact Newton descent steps without needing to compute or store the full Hessian matrix.
arXiv Detail & Related papers (2024-11-27T14:25:00Z)
Leaky ReLUs That Differ in Forward and Backward Pass Facilitate Activation Maximization in Deep Neural Networks [0.022344294014777957]
Activation (AM) strives to generate optimal input, revealing features that trigger high responses in trained deep neural networks. We show that AM fails to produce optimal input for simple functions containing ReLUs or Leaky ReLUs. We propose a solution based on using Leaky ReLUs with a high negative slope in the backward pass while keeping the original, usually zero, slope in the forward pass.
arXiv Detail & Related papers (2024-10-22T12:38:39Z)
Toward Practical Privacy-Preserving Convolutional Neural Networks Exploiting Fully Homomorphic Encryption [11.706881389387242]
Homomorphic encryption (FHE) is a viable approach for achieving private inference (PI) FHE implementation of a CNN faces significant hurdles, primarily due to FHE's substantial computational and memory overhead. We propose a set of optimizations, which includes GPU/ASIC acceleration, an efficient activation function, and an optimized packing scheme.
arXiv Detail & Related papers (2023-10-25T10:24:35Z)
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z)
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z)
Selective Network Linearization for Efficient Private Inference [49.937470642033155]
We propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2times$ less latency (iso-accuracy at 70%) than the current state of the art.
arXiv Detail & Related papers (2022-02-04T19:00:24Z)
Sphynx: ReLU-Efficient Network Design for Private Inference [49.73927340643812]
We focus on private inference (PI), where the goal is to perform inference on a user's data sample using a service provider's model. Existing PI methods for deep networks enable cryptographically secure inference with little drop in functionality. This paper presents Sphynx, a ReLU-efficient network design method based on micro-search strategies for convolutional cell design.
arXiv Detail & Related papers (2021-06-17T18:11:10Z)
Circa: Stochastic ReLUs for Private Deep Learning [6.538025863698682]
We re-think the ReLU computation and propose optimizations for PI tailored to neural networks. Specifically, we reformulate ReLU as an approximate sign test and introduce a novel truncation method for the sign test. We demonstrate improvements of up to 4.7x storage and 3x runtime over baseline implementations.
arXiv Detail & Related papers (2021-06-15T22:52:45Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs. Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions [76.05981545084738]
We propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost. We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts. We show that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-03-07T02:12:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.